AMA: Lora B, UK Biobank Data Analyst

Lora B

UKB Community team Data Analyst

15 August 2022 00:00
8 comments

Hi everyone, I'm a data analyst at UK Biobank. Ask me anything about our dataset!

Comments

8 comments

Former User of DNAx Community_7
- 15 August 2022 15:08
Hi Lora?for context, I'm trying to accomplish what is set out in this post, which is basically to analyze some of the pVCFs using HAIL.

I've now managed to locate the pVCF that I need using this file from the UK Biobank. I tried loading the file into HAIL within Jupyter notebook with the following code (not showing the code I used to load HAIL itself):

hl.import_vcf('/mnt/project/Bulk/Exome sequences/Population level exome OQFE variants, pVCF format - final release/ukb23157_c5_b4_v1.vcf.gz').write('ukb23157_c5_b4_v1.mt', overwrite=True)

But I get an error saying that the file doesn't exist.

I thought that maybe this is some problem with the file being gzipped. To get around this, I tried to extract the file using a terminal with the following command:

gunzip -c ukb23157_c5_b4_v1.vcf.gz > /ukb23157_c5_b4_v1.vcf

But although the command appears to complete, I don't see a file when it does. I presume that the system automatically deleted the file.

I also tried to write it to /mnt/project, but then I got an error saying the filesystem is read-only.

Could you please suggest how to go about this?

Best,
Jeremy

0
Lora B UKB Community team Data Analyst
- 15 August 2022 15:18
Hi Jeremy - thank you for your question! I might need to get some advice from our bioinformatician on best approaches to using the pVCF files with HAIL - will update!

0
Former User of DNAx Community_7
- 15 August 2022 15:21
Thank you very much! I think I could also benefit from understanding read/write permissions to the filesystem. Is there a place I could gunzip the vcf.gz file to?

0
Lora B UKB Community team Data Analyst
- 15 August 2022 15:31
Hi Jeremy

After some bioinformatics advice, our guess is that the problem may be due to HAIL not having access to the dx fuse system, which lets you use the /mnt/project area directly. An alternative approach would be download the pVCF of interest to the local instance, using the dx download command, eg dx download "/Bulk/Exome sequences/Population level exome OQFE variants, pVCF format - final release/ukb23157_c5_b4_v1.vcf.gz". Then you will have a copy on your local system you can use with HAIL. In terms of unzipping the file, again, the problem may be that dx fuse is read-only ; writing should be performed using dx upload. If you download the zipped file to your local instance, you should be able to unzip it locally. Hope that helps!

0
Former User of DNAx Community_8
- 15 August 2022 15:35
Hello-Thanks for doing this Q&A session. My question is about the blood lipid level data in UKB. We have noticed that:
1. The UKB-performed lipid panel data from the "blood biochemistry" testing suggests a very high prevalence of dyslipidemia among UKB participants
2. The "NMR Metabolomics" data indicates a much lower prevalence
3. The NMR metabolomics testing includes two tests, "LDL" and "Clinical LDL", with very different results, and we can't find any documentation of the differences.
Can you help sort any of this out?

0
Former User of DNAx Community_7
- 15 August 2022 15:35
Great?on a broader level I guess I don't understand how the dx fuse system works. Will try your suggestions later today and see what documentation I can find on dx fuse. Thank you all!

0
Lora B UKB Community team Data Analyst
- 15 August 2022 15:48
Hi Eric,

Thank you for your question! Generally, the UKB cohort comprises older individuals, which may account for the high prevalence of dyslipidaemia; is the prevalence higher than you would expect for a cohort of that age distribution? The blood biochemistry panel is available for a larger proportion of the cohort compared to NMR metabolomics data, which may explain the differences in prevalence between the two. Further details on the NMR metabolomics can be found under UK Biobank resource 3000 (https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=3000).

0
Lora B UKB Community team Data Analyst
- 15 August 2022 15:49
Thank you for your questions! Other members of the UK Biobank Data Analyst Team will be around to answer your questions later this week!

0

Please sign in to leave a comment.