How to run VEP docker image using swiss-army-knife?

Hi,

 

So I've been trying for the past few months to annotate my vcfs with VEP. I've managed to do this using docker rather than hail. However, I want to loop my docker annotation command through multiple files. I see you can do this using swiss-army-knife and -iimage. This is the code I am using:

 

for N in 1; do for FILE in $( dx ls "/mnt/project/FI_subset_QC_pVCFs/ukbXXXXX_c${N}_b*_v1.FI_subset.hwe10-6.maxmissing0.02.vcf.gz"); do dx run swiss-army-knife -iin="${project}:/vep_docker.tar.gz" -icmd="./vep -i 'vep/src/ensembl-vep/examples/homosapiens_GRCh38.vcf' --cache --offline --format vcf --vcf --force_overwrite --assembly GRCh38 --fasta 'vep/.vep/homosapiens/106_GRCh38/Homosapiens.GRCh38.dna.toplevel.fa.gz' --fork 8 --dir_cache vep/.vep --dir_plugins vep/.vep/Plugins --dir vep/.vep --symbol --hgvs --numbers --canonical --per_gene --no_check_variants_order --input_file '/mnt/project/FI_subset_QC_pVCFs/${FILE}' --output_file '${FILE%.vcf.gz}.annotated.vcf' --af_gnomad /mnt/project/gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.gz,gnomAD_v2_exome,vcf,exact,0,AF_nfe --plugin dbNSFP,/mnt/project/dbNSFP4.3a_grch38.gz,Ensembl_transcriptid,SIFT4G_score,Polyphen2_HDIV_score,MPC_score,REVEL_score,FATHMM_score,clinvar_id,clinvar_clnsig,clinvar_trait,clinvar_review,clinvar_hgvs,clinvar_var_source,clinvar_MedGen_id,clinvar_OMIM_id,clinvar_Orphanet_id,Interpro_domain,GTEx_V8_gene,GTEx_V8_tissue,Geuvadis_eQTL_target_gene --plugin CADD,'/mnt/project/whole_genome_SNVs.tsv.gz','/mnt/project/gnomad.genomes.r3.0.indel.tsv.gz'" --iimage_file="vep_docker.tar.gz" --destination="${project}:/FI_Annotated_VEP_vcfs/" -imount_inputs=FALSE -y; done; done

 

However this code fails and I know it's due to the folders I'm specifying within the docker container. I have downloaded all dependencies already. Here is the error message:

 

dxpy.utils.resolver.ResolutionError: The specified folder could not be found in project

 

Any help in fixing this code would be very much appreciated. I've been at this for a while now and it's incredibly frustrating . I'm really hoping I can resolve this soon and move on. {@005t00000044KEKAA2}? {@005t0000006BZL2AAO}? {@005t0000005OXUhAAO}? {@005t000000AR7aVAAT}? 

Comments

4 comments

  • Comment author
    Ondrej Klempir DNAnexus Team

    Hi Thomas,

     

    What does happen when you do not use dxfuse (i.e. not use /mnt/project) and you just simply provide input to dx run swiss-army-knife by specifying -iin="${FILE} in each iteration of the for loop? This could enable downloading the file on the worker and may overcome mounting issues.

     

    Also, writing to /mnt/project/ is not possible, it is read-only.

    https://github.com/dnanexus/dxfuse

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    I heard from DNAnexus Support team that you have a solution for now. You resolved the issue by running a loop in a VEP docker container, correct? Anyway, this approach may be slow even with larger instances because vep can crash.

     

    I would be definitely excited to see a tutorial document for running VEP on RAP.

    0
  • Hi @Ondrej Klempir?,

     

    Yes I found a solution by running a loop in VEP docker container. It is a slower approach but trying to run the docker image through swiss-army-knife was proving too difficult. Optimizing the buffer size and fork option for the VEP command is needed to avoid the process crashing. To annotate all VCFs can take half a day or more on an the mem3_ssd2_v2_x8 instance.

     

    I agree, I think a more thorough document on annotating variants is needed. While Hail does work and may suit some people, I feel most people are more comfortable working with vcf formats and not hail tables so the docker solution might be preferred.

    0
  • Hello @Thomas Dinneen?  !

    I'm new to the RAP and VEP and I was wondering how you got VEP to run in a JupyterLabs notebook? I'm currently using the mem2_ssd1_v2_x8 instance. I have a saved .tar.gz of the Docker image with VEP, but I'm unsure how to get it to run in the notebook.

    0

Please sign in to leave a comment.