Have questions about the GEL or TOPMed Impute Data Release? Ask them here!

Chai Fungtammasan DNAnexus Team

19 December 2022 00:00
22 comments

The data should be released this week. https://biobank.ndph.ox.ac.uk/showcase/field.cgi?id=21007 # see Note tab for doc of TOPMed data https://biobank.ndph.ox.ac.uk/showcase/field.cgi?id=21008 # see Resource tab for doc of GEL data

Comments

22 comments

Chai Fungtammasan DNAnexus Team
- 19 December 2022 17:54
We notice that the sample file for BGEN are not formatted correctly. We have notified the UKB and data provider and that would be fixed in the future data release. However, if you want to analyze these GEL and TOPMed Impute datasets meanwhile, it is quite easy and super cheap to fix the format issue. You just need to change the second row of sample files from 0 0 0 0 to 0 0 0 D. You could do this in interactive workstation (e.g. ttyd, clould workstation, jupyter lab, Rstudio, etc), or write a script to do this and put in swiss-army-knife to change the format.

I manually change the chr22 sample file in my testing application and was able to get sample file work with PLINK tool.

However, if anyone run into other problems with these two data (or if the solution I provide above doesn't work), please share with the community.

0
Former User of DNAx Community_6
- 04 January 2023 17:56
May I know how to access topmed imputation files in UKB RAP? I couldn't able to find anywhere in BULK folder of UKB-RAP

0
Chai Fungtammasan DNAnexus Team
- 04 January 2023 18:03
It seems that UKB has not changed permission for research applications to have access to this data that we released in mid December. I can see it only in my testing application, but not research application. I will meet with them next week to check what is the issue and fix it asap.
Once it's available, it would show up as two new folders in Bulk/Imputation. One for GEL and one for TOPMed.

0
Former User of DNAx Community_6
- 04 January 2023 19:10
Thank you so much and looking forward to using it.

0
Chai Fungtammasan DNAnexus Team
- 12 January 2023 00:44
@Akhil Pampana? The data has been released now. You can refresh the project to get it.
It seems that UKB has unrestricted the data a while ago, but somehow it took longer than expected to be in effect.

0
Anastazie Sedlakova DNAnexus Team
- 18 January 2023 16:24
I made a short python notebook to loop for BGEN sample files for all chromosomes.

0
Former User of DNAx Community_6
- 25 January 2023 20:22
Thank you so much for the resource. I could able to access the files. Its really helpful

0
Former User of DNAx Community_69
- 13 February 2023 06:14
Hello,
Happy to see the TopMed release for ukbb.
It will greatly improve our approved project.

However, after refreshing the dataset following instructions, I still cannot access it.
I tried using : gfetch 21007 -with my approved key and got this error.

Error: Field=21007 is not permitted for download
Download failure

Can you please advise? I don't see any specifics in community discussions.
Thank you.

0
Chai Fungtammasan DNAnexus Team
- 13 February 2023 06:28
The TOPMed and GEL data need to be analyzed on UKB-RAP only per the MTA, so could not download data from Showcase.

0
Former User of DNAx Community_69
- 13 February 2023 22:00
Thank you. I modified for TopMed.
But now wondering how to actual run?
Can I run from dx tools?

0
Former User of DNAx Community_69
- 13 February 2023 22:04
Thank you!

0
Chai Fungtammasan DNAnexus Team
- 13 February 2023 22:12
Yes, in this example, you can use jupyter notebook to process them.
See tutorial on how to run jupyter notebook on UKB-RAP here https://www.youtube.com/watch?v=YIPdhf3qbQA&list=PLRkZ0Fz-n3Z7Jg0Vz4vudLYnBza4EUGLM&index=21

Or you can copy only the code and run in Python within ttyd app too.

0
Former User of DNAx Community_67
- 26 April 2023 15:52
Hello,

I have been trying to analyze the haplotypes of a a specific number of individuals from the UK Biobank. Ultimately, I want to compute LD within a specific genomic region and visualize it using tools such as Haploview.

What I have done so far:

1) Downloaded the imputed data from TopMed for the individuals of interest using DNAnexus SwissArmyKnife tool.

2) Adjusted the .sample according to Chai's comment above.

3) Filtered the genomic region of interest. Btw: I observed the same issue as reported in this link: lack of rsids in the .bgen file (https://community.dnanexus.com/s/question/0D5t000004SBxtyCAD/potential-issues-with-imputed-data)

I now want to compute LD and visualize haplotype blocks among all SNPs in this region.
- Is QCtool the best approach for this task? Based on its documentation, it's not clear to me how to calculate LD within a single genotype file. Should I use the same .bgen and .sample files in the code, for example:
qctool -g file.bgen -s file.sample -compute-ld-with file.bgen file.sample -old sqlite://results.sqlite:LD

It apparently works but I wanted to make sure I'm getting the correct results (I haven't added any additional arguments so far, just wanted to test the default options).

Other errors I have gotten while manipulating the .bgen file in order to use as input in different tools (all with the purpose of generating input files for Haploview).
- plink: Error: '--export haps' must be used with a fully phased dataset.
- plink: error as reported in: https://www.biostars.org/p/9484011/#9484013
- Conversion to hap/sample/legend file using bcftools: same error as reported in https://github.com/samtools/bcftools/issues/730
- After conversion to .vcf, the genotypes appear as "0/0, 0/1,1/1" instead of the expected output for phased genotypes (0|0, 0|1, etc...).
I guess my questions are:

a) Given the errors above, is the TOPMED data (field 21007) phased as I am assuming? If so, any chance that I might've lost phase information while downloading the data for the individuals of interest?

b) For the purpose of this type of analysis, are the files found in Bulk>Imputation (22828, 21007, 21008) the options one indeed should be using? It's a bit unclear to me the definition of field 22438.

c) If the above qctool command is the appropriate way to go, could anyone be kind enough to help me figure out how to graphically visualize the results stored in "results.sqlite" as I'm not very familiar with .sql files manipulation?

Thank you very much for any insights of this community.
0
Chai Fungtammasan DNAnexus Team
- 26 April 2023 17:40
Could you repost this as a new question? It's pretty hard questions, so I want to see if other members in community could chime in.
I want to note though that there will be phasing WGS data coming out around July this year for 200k WGS data.

0
Former User of DNAx Community_67
- 26 April 2023 17:50
Thanks for replying, Chai. Will do. I'm glad to know there will be a phasing WGS data release soon.

0
Chai Fungtammasan DNAnexus Team
- 26 April 2023 19:18
You are welcome. If you are interested in phasing data, you may find this two talks useful.
https://www.youtube.com/watch?v=jF2GKfrWaz4&t=8s
https://www.youtube.com/watch?v=iNtg9PuYj4g&t=1s

0
Former User of DNAx Community_67
- 28 April 2023 01:09
Awesome. Thanks for sharing these talks. I?ve just watched them and they were super informative. Excited for the phased WGS release in the next few months.

0
Former User of DNAx Community_28
- 16 June 2023 16:26
I updated my GWAS repo for TOPmed imputed data using plink. I will work on adding the regenie version sometime in the near future.

https://github.com/pjgreer/ukb-rap-tools/tree/main/GWAS_pipeline/gwas_topmed_plink

I have a separate question, I see there is a paper on the GEL methods for imputation in the showcase, but there does not seem to be one for the TOPmed imputation. Has anyone seen this paper yet?

0
Chai Fungtammasan DNAnexus Team
- 16 June 2023 16:42
Would the note and resource section of https://biobank.ndph.ox.ac.uk/showcase/field.cgi?id=21007 contain information you are looking for?

0
Former User of DNAx Community_28
- 16 June 2023 16:59
Chai,

No, that is really the bare minimum information.

The original HRC imputation paper (https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=530) and the GEL pdf (https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=10510) are really what I am looking for. The TOPmed document just doesn't seem to exist yet.

Specifically, how many snps passed QC to be submitted to the imputation server? How large were the batches? (HRC, 4700 per batch, GEL 26K per batch) did they try to submit batches by reported ancestry? etc...

0
Chai Fungtammasan DNAnexus Team
- 16 June 2023 17:14
thanks for this note Phil. I will pass on this request to UKB.

0
Felix Vaura
- 31 October 2025 08:06
Hello,
Any news on the TOPMed QC details?
Best,
Felix

0

Please sign in to leave a comment.