Get EID of variant carriers from pVCF?

Permanently deleted user
I have a pVCF that I have filtered to only include variants that meet a specific criteria. I would like to pull the EIDs of all the samples who are carriers (even better if I could pull out indicators as to zygosity as well) .

Comments

5 comments

  • Comment author
    Ondrej Klempir DNAnexus Team

    See the recent thread showing the Hail functionality and how to analyze genomics data using Hail:

    https://community.dnanexus.com/s/question/0D5t0000043xrVhCAI/hail-tutorial-and-example-notebooks-for-ukbrap-analysis

     

    I would especially concentrate myself on

    https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/pVCF_import.ipynb

    and

    https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/filter_varid.ipynb

    0
  • Comment author
    Former User of DNAx Community_23

    Hi Ondrej Klempir,

    I am trying to import pVCF genomic data following the demo jupyter notebook file as mentioned above. However, I am experiencing an issue at the last step of storing the MT in DNAX. I have given db_name and mt_name in the cell above and I can see in the DNAnexus project that a database with the given name has been created. But I don't understand why this error is occurring. It would be great if you could help me solve this issue. Thank you!

    error

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    What is the name of your database in DNAnexus project? Is it Chromosome19_b0_v1?

    What is in the mt_name variable?

    What is in the db_uri variable?

    0
  • Comment author
    Former User of DNAx Community_23

    Dear Ondrej, Thanks for your response! I have solved that issue already by changing the db_name. For some reason, db _name with a capital letter did not work.

    I have another thing to ask you. I want to filter variants from a gene located on chromosome 19. There are multiple vcf_gz files for the whole chromosome19. I am wondering how to import data from multiple .gz files. It's mentioned in the demo jypter file that 'regex' can be used for importing data from multiple files. However, I could not find any documentation on regex. Would be great if you could give me insights in this regard. Also, is there a way to find out the genomic region covered in each .gz file?

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    Could you ask this as a separate question? I think community member didn't see it that there is another question embedded here.

    0

Please sign in to leave a comment.