Best resources for joint call pVCF exploration?

Dear community,

 

I'm interested in analysing Whole genome  GraphTyper joint call pVCF and I was wondering if there is any best practice to explore this data in terminal/ R studio and Jupyter.

 

I already evaluated the following resources but no one among this offered me specific examples for this case:

 

Could you please indicate me the best resource to learn how to use the Whole genome  GraphTyper joint call pVCF data on DNAnexus terminal, Jupyter or Rstudio?

If possible, a step by step tutorial with examples would be invaluable.

 

Thank you in advance for your support,

 

Veronica

Comments

3 comments

  • Comment author
    Chai Fungtammasan DNAnexus Team

    I would say that most of UKB-RAP tutorials (Overview, Jupyter, Rstudio) cover basic file exploration. The best option would also depend on what tools that you usually use for the file exploration. If you are R person, then use Rstudio. If you could code in Python, then Jupyterlab would be the best option. If you usually use unix to explore the file, then use ttyd, cloud workstation, or swiss-army-knife would be the way to go. I embed links for related tutorial above.

     

    Here is example how to explore the file in swiss-army-knife.

     

    bcftools view <WGS_file.vcf.gz> | head -2000 > header_2k_line.txt

     

    then you can view the header part of the file in header_2k_line.txt

     

    If you want to do this more interactively, you can download data that you need into work station with `dx download`, install the tools that you would use for exploring the file, and just run those tools there.

    For example, you can use ttyd (see instruction in Overview video). The ttyd won't have bcftools pre-install, so you need to install the bcftools (see instruction here http://www.htslib.org/download/).

    After you got the tool install and input data locally, you can run the command like below to inspect the file.

    bcftools view <WGS_file.vcf.gz> | less

     

    There is a more sophisticate trick like using dxfuse too, but the guideline I post here would work for all interactive workers.

     

     

    0
  • Comment author
    Former User of DNAx Community_30

    Hi Chai,

     

    Thank you very much for your complete answer! It really helped me a lot.

     

    Just a quick follow-up question: I gave a try to the options you suggests and, trying the dx download on the .tab.gz file of the WGS QC metrics, as below, I encounter a permission denial.

    I was wondering if I did anything wrong and if it is possible to solve this error.

     

    dx download '/Bulk/Whole genome sequences/Whole genome GraphTyper joint call pVCF/QC/qc_metrics_graphtyper_v2.7.1_qc.tab.gz'

     

    ERROR:dxpy:[Thu Sep 8 16:56:52 2022] GET https://dl.ew2.dnanex.us/F/D2PRJ/file-G3XKYZ0JYv1Xjqg06k9Y8B8f/project-GFGfXB8JF4FKG8jb16p197J8: {"error":{"type":"PermissionDenied","message":"Cannot download due to prohibitExternalDownload policy rule"}}.

     

    Thanks!

     

    Veronica

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    The WGS and WES could not be exported outside the RAP per UKB policy, so the data exploration of these files need to be done on RAP. You would have to run one of interactive workstations that I mentioned above and download the file to the workstation rather than your local computer.

    0

Please sign in to leave a comment.