How to filter snpEff.vcf.gz file on UKB RAP?

Permanently deleted user
I've been working on UKB WES data on RAP and I have a snpEff.vcf.gz file. The size of this file is so large. I'm just interested in specific genes and specific information. How can I filter this file according to things that interest me on UKB RAP?

Comments

7 comments

  • Comment author
    Ondrej Klempir DNAnexus Team

    Hi @Burcu Çevik?,

     

    What I would do - I would run JupyterLab and explore the vcf there.

    If you prefer bash, one good option is to use vcftools [https://vcftools.sourceforge.net/man_latest.html].

    If you prefer python, my favourite tool is pyVCF [https://pyvcf.readthedocs.io/en/latest/FILTERS.html].

     

    In addition to JupyterLab interactive work, vcftools is also part of Swiss Army Knife supported app on UKB-RAP.

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    I personally like to use bcftools in swiss-army-knife.

    0
  • Comment author
    Permanently deleted user

    {@005t0000006BZL2AAO}? {@005t000000149vjAAA}? Thanks for your answers. I tried to use bcftools in SAK and I used a bed file to filter according to specific genes but I received an error. I share log file with you below. I don't know where I went wrong. Do you have any suggestions?Ekran Resmi 2023-06-09 18.29.18 

     

    Failure from origin-job

    --------------------------------

    {

      "id": "job-GVQ8bZjJbBpgxgJBY2q8zk6X",

      "name": "Swiss Army Knife",

      "function": "main",

      "stage": null,

      "analysis": null,

      "executable": "app-GKyyzJQ951j4Bkfq4jFkGX1K",

      "executableName": "swiss-army-knife",

      "failureReason": "AppError",

      "failureMessage": "Error while running the command (please refer to the job log for more information)."

    }

     

    Origin-job Inputs

    --------------------

    {

      "in": [

        {

          "$dnanexus_link": {

            "project": "project-GKQbbYQJbBpk5YXf0JpzGXKK",

            "id": "file-GVQ8b50JbBpx6QJf70YkPPv2"

          }

        },

        {

          "$dnanexus_link": {

            "project": "project-GKQbbYQJbBpk5YXf0JpzGXKK",

            "id": "file-GVBJvy0JK69B1QFPkbb12k12"

          }

        }

      ],

      "cmd": "bcftools view -R deneme.bed 1798762.snpEff.vcf.gz > filtered.vcf",

      "mount_inputs": false

    }

     

    View log of failed sub-job

    --------------------------------

    Logging initialized (priority)

    Downloading bundled file resources.tar.gz

    >>> Unpacking resources.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file qctool.tar.gz

    >>> Unpacking qctool.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file plato.tar.gz

    >>> Unpacking plato.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file bedtools.tar.gz

    >>> Unpacking bedtools.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file htslib.tar.gz

    >>> Unpacking htslib.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file java.tar.gz

    >>> Unpacking java.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file plink.tar.gz

    >>> Unpacking plink.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file r.tar.gz

    >>> Unpacking r.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file sambamba.tar.gz

    >>> Unpacking sambamba.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file seqtk.tar.gz

    >>> Unpacking seqtk.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file vcflib.tar.gz

    >>> Unpacking vcflib.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file vcftools.tar.gz

    >>> Unpacking vcftools.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file plink2.tar.gz

    >>> Unpacking plink2.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file regenie.tar.gz

    >>> Unpacking regenie.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file bolt-lmm_asset.tar.gz

    >>> Unpacking bolt-lmm_asset.tar.gz to /

    tar: Removing leading `/' from member names

    Downloading bundled file bgen.tar.gz

    >>> Unpacking bgen.tar.gz to /

    tar: Removing leading `/' from member names

    dxpy/0.346.0 (Linux-5.15.0-1031-aws-x86_64-with-glibc2.29)

    bash running (job ID job-GVQ8bZjJbBpgxgJBY2q8zk6X)

    downloading file: file-GVQ8b50JbBpx6QJf70YkPPv2 to filesystem: /home/dnanexus/in/in/0/deneme.bed

    downloading file: file-GVBJvy0JK69B1QFPkbb12k12 to filesystem: /home/dnanexus/in/in/1/1798762.snpEff.vcf.gz

    Using dxfuse version v1.0.0

    The log file is located at /root/.dxfuse/dxfuse.log

    starting fs daemon

    wait for ready

    Daemon started successfully

    Downloading files using 4 threads+ [[ '' == '' ]]

    + eval 'bcftools view -R deneme.bed 1798762.snpEff.vcf.gz > filtered.vcf'

    ++ bcftools view -R deneme.bed 1798762.snpEff.vcf.gz

    [E::idx_find_and_load] Could not retrieve index file for '1798762.snpEff.vcf.gz'

    Failed to read from 1798762.snpEff.vcf.gz: could not load index

    END_LOG

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    The error message "[E::idx_find_and_load] Could not retrieve index file for '1798762.snpEff.vcf.gz'" indicates that for running bcftools, you will need to provide an index file for your vcf file. Using tabix or bcftools index command to build an index file could solve this problem - you can try to do it on the same worker prior to bcftools command.

     

    Not tested on my end, but (hope)I found relevant threads for example here:

    https://github.com/samtools/bcftools/issues/129

    https://www.biocomputix.com/post/bctools-index-how-to-create-index-for-vcf-files

    0
  • Comment author
    Permanently deleted user

    Hi @Ondrej Klempir?,

    Is the index file 1798762.snpEff.vcf.gz.tbi file? I already have that file. I don't need to build.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Yes, I would say so. Sounds good. Try to use it as input file for your Swiss Army Knife job.

    0
  • Comment author
    Permanently deleted user

    Thank you Ondrej. This time I did not receive an error message.

    0

Please sign in to leave a comment.