UK Biobank provides a range of resources, including example notebooks and comprehensive guides, to help researchers understand how to use the UK Biobank Research Analysis Platform (UKB-RAP).
The UK Biobank GitHub serves as a great starting point, offering code examples and insights to support effective research and data analysis on the platform.
The UK Biobank GitHub includes a collection of repositories designed to guide researchers in accessing, extracting, and analysing UK Biobank data. These repositories cover a wide range of topics, with specialised resources for handling genetic and phenotypic data.
The notebooks within these repositories offer practical examples, helping researchers get started with data processing on the platform and learn how to use the available tools effectively as they apply them to UK Biobank data.
Repositories currently available
Further information on the content of each notebook can be found in the ReadMe associated with each repository.
UKB-RAP-Notebook-Access
This repository includes notebooks on performing basic operations to access and analyse UKB phenotypic data. This data is held in a Spark dataset on the platform.
The A-series (Accessing Data) notebooks in this repository give an overview of how to access and manipulate the phenotypic databases on the UKB-RAP that are likely to be linked with the genomic or other resources. The notebooks are written in Python, R and bash. There are notebooks available for both JupyterLab and RStudio.
Figure 1
Content:
A101 Explore phenotype tables (language = Python; instance = Spark)
A102 Explore participant data (Python; Spark)
A103 Export participant data (Python; Spark)
A104 Explore phenotype tables (R; Single Node)
A105 Export participant data (R; Spark)
A106 Hypertension data (R; Spark)
A107 OMOP data: hypertension case-study (R; Spark)
A108 Constructing the OLINK dataset (R; Spark)
A109 Find imaging bulk files (Bash; Python)
A110 Export participant data (R; RStudio)
A111 Import and Analyse participant data (R; RStudio)
________________________________________________________________________________________________________________
UKB-RAP-Notebooks-Genomics
This repository include notebooks that focus on performing genomics analytics workflows on the RAP.
These notebooks illustrate how to perform many standard analyses (e.g, GWAS, population genomics, functional annotation) that are typically employed in bioinformatic studies.
The notebooks running down the centre of Figure 2 form the core repository of analytical workflows. As you run through them you will also create files required for subsequent workflows (purple and blue arrows; Figure 2). Furthermore, a number of these analyses are contingent on data files created and uploaded to your RAP project in the A-series (Accessing data) notebooks (black arrows; Figure 2). Figure 2 indicates which A-series notebooks are required to be run to create the requisite files for these analyses; e.g. the A-series notebook A103 is required to perform the G202 GWAS analysis. Finally, there are genomics-based preliminary notebooks (G101 and G102) contained in this repo that outline basic genomic data file handling methods with G102 also providing genomic input files (green arrows) for a number of the main analytical workflows (Figure 2).
Figure 2
Content:
Genomics preliminary workflows
G101 UKB pipeline pVCF to PLINK (language = Bash; instance = Single Node)
G102 Processing variant data using PLINK (Bash; Single Node)
G103 Retrieve participant data for Hail GWAS (Bash; Spark)
Genomics analytical workflows
G201 Population structure (PCA) ethnicity (R; Spark)
G202 GWAS participant height (R; Single Node)
G203 GWAS hypertension (R; Single Node)
G204 Polygenic risk scores of participant height (R; Single Node)
G205 Polygenic risk scores for hypertension (R; Single Node)
G206 Annotate SNPs from dbSNP and profile ontologies (R; Single Node)
G207 Functional annotations of variants (R; Single Node)
G208 GWAS in Hail (Python; Spark/Hail-VEP)
________________________________________________________________________________________________________________
UKB-RAP-Workflows
This repository focuses on executing complex, multi-stage workflows on the UKB-RAP.
By using tools such as Apps and applets, and Workflow Description Language (WDL), researchers can create scalable, parallelised, portable workflows that allow optimised computational resource control for large-scale analyses.
Content:
WDL-vcf2bin: Conversion of text formatted population variants (pVCF) to binary formats (PLINK and BGEN) for 200k GraphTyper call.
- This workflow was employed by UK Biobank in the conversion of Whole genome sequencing 200k pVCFs into Plink and BGEN formatted binaries.
________________________________________________________________________________________________________________
SNP-filtering
This repository contains a JupyterLab notebook allowing individual SNPs to be filtered from the UKB genotyping data.
A few approaches to filtering are provided in the notebook, including options to filter by individual SNP rsIDs or by genomic regions of interest.
________________________________________________________________________________________________________________
Using the resources on the Github
To make the most of the resources available on the UK Biobank GitHub, you can clone the repository directly into the UKB-RAP environment using the JupyterLab or RStudio terminal.
Here is a simple guide to get started:
1. Open the JupyterLab or RStudio terminal from the tools library in the UKB-RAP.
2. Use the following command to clone the repository:
git clone https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access.git
You can replace "UKB-RAP-Notebooks-Access.git" with the other repository names.
3. Navigate to the cloned repository using the terminal or file explorer to access the resources.
Where to Go for Help
If you encounter any issues or errors while running the code provided, we encourage you to file an issue on the UK Biobank Github.
For any other queries, please contact the Access Team by submitting a ticket.
Suggest something not covered on the UK Biobank GitHub
We are always looking for ways to improve and expand our resources. If there is something you would like to see that isn’t currently covered on our GitHub repository, we welcome your suggestions! Simply send your requests and ideas by submitting a ticket to the Access Team, and they will review and consider them for future updates and enhancements.
Related to
Comments
0 comments
Article is closed for comments.