I have a pVCF that I have filtered to only include variants that meet a specific criteria. I would like to pull the EIDs of all the samples who are carriers (even better if I could pull out indicators as to zygosity as well) .
I am trying to import pVCF genomic data following the demo jupyter notebook file as mentioned above. However, I am experiencing an issue at the last step of storing the MT in DNAX. I have given db_name and mt_name in the cell above and I can see in the DNAnexus project that a database with the given name has been created. But I don't understand why this error is occurring. It would be great if you could help me solve this issue. Thank you!
Dear Ondrej, Thanks for your response! I have solved that issue already by changing the db_name. For some reason, db _name with a capital letter did not work.
I have another thing to ask you. I want to filter variants from a gene located on chromosome 19. There are multiple vcf_gz files for the whole chromosome19. I am wondering how to import data from multiple .gz files. It's mentioned in the demo jypter file that 'regex' can be used for importing data from multiple files. However, I could not find any documentation on regex. Would be great if you could give me insights in this regard. Also, is there a way to find out the genomic region covered in each .gz file?
Comments
5 comments
See the recent thread showing the Hail functionality and how to analyze genomics data using Hail:
https://community.dnanexus.com/s/question/0D5t0000043xrVhCAI/hail-tutorial-and-example-notebooks-for-ukbrap-analysis
I would especially concentrate myself on
https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/pVCF_import.ipynb
and
https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/filter_varid.ipynb
Hi Ondrej Klempir,
I am trying to import pVCF genomic data following the demo jupyter notebook file as mentioned above. However, I am experiencing an issue at the last step of storing the MT in DNAX. I have given db_name and mt_name in the cell above and I can see in the DNAnexus project that a database with the given name has been created. But I don't understand why this error is occurring. It would be great if you could help me solve this issue. Thank you!
What is the name of your database in DNAnexus project? Is it Chromosome19_b0_v1?
What is in the mt_name variable?
What is in the db_uri variable?
Dear Ondrej, Thanks for your response! I have solved that issue already by changing the db_name. For some reason, db _name with a capital letter did not work.
I have another thing to ask you. I want to filter variants from a gene located on chromosome 19. There are multiple vcf_gz files for the whole chromosome19. I am wondering how to import data from multiple .gz files. It's mentioned in the demo jypter file that 'regex' can be used for importing data from multiple files. However, I could not find any documentation on regex. Would be great if you could give me insights in this regard. Also, is there a way to find out the genomic region covered in each .gz file?
Could you ask this as a separate question? I think community member didn't see it that there is another question embedded here.
Please sign in to leave a comment.