New Data release on UKB-RAP: WGS Data from 500,000 Participants, Proteomics Data & Updated Imaging Data
The 500k Whole Genome Sequencing data is now available on the UK Biobank Research Analysis Platform (UKB-RAP). This completes UK Biobank's ambitious Whole Genome Sequencing initiative, replacing the interim 200k Whole Genome Sequencing dataset made available in November 2021.
This data release includes population (pVCF) and individual-level (CRAM, gVCF and more) Whole Genome Sequencing data for all UK Biobank participants. It does not include PLINK or BGEN formatted files, these should be available in the first release of 2024. To view the full details of the data in this November release, please visit the UK Biobank website.
The proteomic data for levels of 3,000 proteins in 56,000 people, together with updated imaging files (both of which were included within Data Showcase as part of the October 2023 release) are now also available through UKB-RAP.
Please note: we are expecting a great amount of interest in this new data, and there may be long queues for data dispensing. We have been working hard to help improve your experience and we would strongly encourage you to read our FAQ before dispensing any data within UKB-RAP.
Given the UK Biobank dataset now comprises about 30 petabytes of data, with nearly 18 million individual-level and 600,000 population-level files, we are introducing a new function as part of the data dispensal process. You can select which elements of the data to dispense (i.e. population-level files and/or individual-level), so that you only dispense the data you need ? this will make the system more efficient for everyone. This function is now part of the standard data dispensing process.
We are stepping into the unknown in terms of both the scale of data and demand. If you have any questions, please direct them to the very helpful members of the online community forum, or otherwise contact ukbiobank-support@dnanexus.com. If you have an access-related query, such as changing to tier 3 to be able to access this data, then please raise a ticket.
Comments
5 comments
The data refresh section for my application indicates that requesting the WGS data is not currently possible. Can you please clarify how we may request access within pre-existing projects?
You will have to create a new project to dispense the data to. You can follow the steps listed in the FAQ: https://www.ukbiobank.ac.uk/media/dovbae03/uk-biobank-final-whole-genome-sequencing-release-faqs_v1-0.pdf
In that case, perhaps that document can be clarified, because that document currently states that you can select the data after the project has been created:
The WGS FAQ actually is clear about this, but it contradicts the PDF that you linked: https://dnanexus.gitbook.io/uk-biobank-rap/500k-wgs-faq
Thank you for bringing this to our attention and we'll be clarifying the FAQ. You will need to create a new project in order to access the data. You have the option to dispense the data on project creation or later in the project settings of that new project.
Please sign in to leave a comment.