AMA: Lora B, UK Biobank Data Analyst

Lora B The helpers that keep the community running smoothly. UKB Community team Data Analyst
Hi everyone, I'm a data analyst at UK Biobank. Ask me anything about our dataset!

Comments

8 comments

  • Hi Lora?for context, I'm trying to accomplish what is set out in this post, which is basically to analyze some of the pVCFs using HAIL.

     

    I've now managed to locate the pVCF that I need using this file from the UK Biobank. I tried loading the file into HAIL within Jupyter notebook with the following code (not showing the code I used to load HAIL itself):

     

    hl.import_vcf('/mnt/project/Bulk/Exome sequences/Population level exome OQFE variants, pVCF format - final release/ukb23157_c5_b4_v1.vcf.gz').write('ukb23157_c5_b4_v1.mt', overwrite=True)

     

    But I get an error saying that the file doesn't exist.

     

    I thought that maybe this is some problem with the file being gzipped. To get around this, I tried to extract the file using a terminal with the following command:

     

    gunzip -c ukb23157_c5_b4_v1.vcf.gz > /ukb23157_c5_b4_v1.vcf

     

    But although the command appears to complete, I don't see a file when it does. I presume that the system automatically deleted the file.

     

    I also tried to write it to /mnt/project, but then I got an error saying the filesystem is read-only.

     

    Could you please suggest how to go about this?

     

    Best,

    Jeremy

     

     

    0
  • Comment author
    Lora B The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Jeremy - thank you for your question! I might need to get some advice from our bioinformatician on best approaches to using the pVCF files with HAIL - will update!

    0
  • Thank you very much! I think I could also benefit from understanding read/write permissions to the filesystem. Is there a place I could gunzip the vcf.gz file to?

    0
  • Comment author
    Lora B The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Jeremy

     

    After some bioinformatics advice, our guess is that the problem may be due to HAIL not having access to the dx fuse system, which lets you use the /mnt/project area directly. An alternative approach would be download the pVCF of interest to the local instance, using the dx download command, eg dx download "/Bulk/Exome sequences/Population level exome OQFE variants, pVCF format - final release/ukb23157_c5_b4_v1.vcf.gz". Then you will have a copy on your local system you can use with HAIL. In terms of unzipping the file, again, the problem may be that dx fuse is read-only ; writing should be performed using dx upload. If you download the zipped file to your local instance, you should be able to unzip it locally. Hope that helps!

    0
  • Hello-Thanks for doing this Q&A session. My question is about the blood lipid level data in UKB. We have noticed that:

    1. The UKB-performed lipid panel data from the "blood biochemistry" testing suggests a very high prevalence of dyslipidemia among UKB participants

    2. The "NMR Metabolomics" data indicates a much lower prevalence

    3. The NMR metabolomics testing includes two tests, "LDL" and "Clinical LDL", with very different results, and we can't find any documentation of the differences.

    Can you help sort any of this out?

    0
  • Great?on a broader level I guess I don't understand how the dx fuse system works. Will try your suggestions later today and see what documentation I can find on dx fuse. Thank you all!

    0
  • Comment author
    Lora B The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Eric,

     

    Thank you for your question! Generally, the UKB cohort comprises older individuals, which may account for the high prevalence of dyslipidaemia; is the prevalence higher than you would expect for a cohort of that age distribution? The blood biochemistry panel is available for a larger proportion of the cohort compared to NMR metabolomics data, which may explain the differences in prevalence between the two. Further details on the NMR metabolomics can be found under UK Biobank resource 3000 (https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=3000).

    0
  • Comment author
    Lora B The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Thank you for your questions! Other members of the UK Biobank Data Analyst Team will be around to answer your questions later this week!

    0

Please sign in to leave a comment.