Is there a way to download phase 1 release and phase 2 release genotype calls separately? Thanks!

Former User of DNAx Community_75

19 February 2023 00:00
9 comments

Comments

9 comments

Anastazie Sedlakova DNAnexus Team
- 20 February 2023 10:26
What is the field ID that you want to download?

0
Former User of DNAx Community_75
- 20 February 2023 20:32
The genotype calls : 22418 and the imputed data (all methods) as well. I have the genotype files downloaded already so can use the pariticpant IDs for each phase to parse on my end. However, if there is an easier way to do this with a new download, I can do that as well.

I'm just getting started on the imputation files and a quick question on how best to use them:
1) I'm recreating a dataset used in Khera at al, 2018 and they used imputed data from all three. Is there a workflow that combines these and is there overlapping SNPs in each of the datasets?

Many thanks for your help.

Keri

0
Chai Fungtammasan DNAnexus Team
- 21 February 2023 01:16
The newly released impute data (GEL and TOPMed) could not be downloaded per the MTA with UKB.

I'm not aware of a pipeline to compare the same EID across all of them. I just know that you have to make sure to use EID in sample file rather than within BGEN. Also, the array and original impute data are in GRCh37, while the rest of main genomics dataset are in GRCh38

0
Anastazie Sedlakova DNAnexus Team
- 21 February 2023 08:51
By quick googling I found tutorial on how to extract common variants from two sets of PLINK files. This tutorial uses PLINK, which is installed in Swiss Army Knife tool on UKB RAP. Alternativelly, you can install PLINK on cloud workstation or ttyd (web-based terminal)

0
Former User of DNAx Community_75
- 21 February 2023 20:28
Thanks for the reply and information. What I'm looking for is a list of people who are in the phase 1 release and a list within the phase 2 release of the ukbiobank data. Do you know where I might find this info?

Quotes from the publication may help with clarity:
" ... a validation dataset of 120,280 participants of European ancestry derived from the UK Biobank phase 1 release. " is used as a training set and "The testing dataset was comprised of 288,978 UK Biobank phase 2 genotype data release participants distinct from those in the training dataset described above". Does that make sense? I'm trying to recreate these datasets for my research.

Thank you!

0
Chai Fungtammasan DNAnexus Team
- 21 February 2023 20:34
I see. I don't know where to find that list. You may contact UKB directly or ask author of the papers.

0
Former User of DNAx Community_75
- 21 February 2023 20:38
Ok. Thank you.

0
Anastazie Sedlakova DNAnexus Team
- 22 February 2023 11:57
@Keri Multerer? I am assuming that you are citing this article. I was able to find documentation for the interim genotype result in Showcase documentation, but it is not connected with any data field.

0
Former User of DNAx Community_75
- 22 February 2023 20:38
Thank you! This was a great suggestion but doesn't have the data I'm looking for. I've reached out the UKBiobank directly and may need to go to plan B, which is randomizing participants into two groups with equal sizes to what is in the publication. Theoretically, with these large numbers, this should be OK to do. Thanks for all of your help.

0

Please sign in to leave a comment.