Get EID of variant carriers from pVCF?

Permanently deleted user

23 August 2022 00:00
5 comments

I have a pVCF that I have filtered to only include variants that meet a specific criteria. I would like to pull the EIDs of all the samples who are carriers (even better if I could pull out indicators as to zygosity as well) .

Comments

5 comments

Ondrej Klempir DNAnexus Team
- 12 September 2022 11:58
See the recent thread showing the Hail functionality and how to analyze genomics data using Hail:
https://community.dnanexus.com/s/question/0D5t0000043xrVhCAI/hail-tutorial-and-example-notebooks-for-ukbrap-analysis

I would especially concentrate myself on
https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/pVCF_import.ipynb
and
https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/filter_varid.ipynb

0
Former User of DNAx Community_23
- 23 September 2022 13:43
Hi Ondrej Klempir,
I am trying to import pVCF genomic data following the demo jupyter notebook file as mentioned above. However, I am experiencing an issue at the last step of storing the MT in DNAX. I have given db_name and mt_name in the cell above and I can see in the DNAnexus project that a database with the given name has been created. But I don't understand why this error is occurring. It would be great if you could help me solve this issue. Thank you!

0
Ondrej Klempir DNAnexus Team
- 27 September 2022 09:02
What is the name of your database in DNAnexus project? Is it Chromosome19_b0_v1?
What is in the mt_name variable?
What is in the db_uri variable?

0
Former User of DNAx Community_23
- 27 September 2022 09:18
Dear Ondrej, Thanks for your response! I have solved that issue already by changing the db_name. For some reason, db _name with a capital letter did not work.
I have another thing to ask you. I want to filter variants from a gene located on chromosome 19. There are multiple vcf_gz files for the whole chromosome19. I am wondering how to import data from multiple .gz files. It's mentioned in the demo jypter file that 'regex' can be used for importing data from multiple files. However, I could not find any documentation on regex. Would be great if you could give me insights in this regard. Also, is there a way to find out the genomic region covered in each .gz file?

0
Chai Fungtammasan DNAnexus Team
- 25 October 2022 19:13
Could you ask this as a separate question? I think community member didn't see it that there is another question embedded here.

0

Please sign in to leave a comment.