Analyzing VCFs of specific disease participants to see variants in a particular gene

Arsala Ali

Hi

I am a very new user on UKB-RAP and in the process of learning through video tutorials and webinars. I got a good know-how about how things work on the platform but still there are some complexities to be understood.

If someone would please provide me some tailored guideline regarding my specific goal. I need to collect SV vcf files of the participants with vascular dementia (from the ‘whole genome SV call files (DRAGEN)’ directory under Bulk directory). I further have to process those SV vcf files of the vascular dementia patients (around 2000 vcf files) to collect variants in a particular gene.

Using cohart browser I have already got a list of ~2000 participant IDs in a text file. I need to copy their respective ~2000 VCFs to an executable environment where I can process these VCFs through linux command line utilities (e.g. awk, sed, grep etc.) and bcftools and vcftools. I have installed dnanexus toolkit to access the platform through CLI but that provides very limited environment and it seems I can’t process vcf files there. 

If someone can please specifically guide me how to copy the VCFs of my desired participant IDs to an environment where I can process those VCFs through linux command line utilities and vcftools/bcftools. What I thought was to use shell applet under JupyterLab and copy the entire ‘whole genome SV call files (DRAGEN)’ folder through dx download there. Then using a shell script to copy the desired ~2000 VCFs in another directory and further analyze those VCFs. Please let me know if this workflow seems correct. I can’t think of a way through which I can download the required ~2000 VCFs from project space to instance space, so I thought to copy entire folder to the shell applet in JupyterLab and then run the script there to copy the required files to another folder. Also, would it be possible to use vcftools/bcftools on the shell applet of JupyterLab. 

I shall be very grateful if someone who has done similar analysis (analyzing VCFs of specific disease participants to see variants in a particular gene) can please walk me through it step by step. Thanks in advance.

Comments

1 comment

Please sign in to leave a comment.