Where is dxjupyterlab-vep on spark cluster with HAIL-VEP? Cache is not found either.

So I'm trying to annotate my quality controlled pVCF files on UKB-RAP. I'm using a mem3_ssd2_v2_x16 spark cluster instance with HAIL-0.2.78-VEP-1.03. The main error I'm getting is that the path to the VEP cache is incorrect but on further inspection, it appears VEP isn't fully installed.

 

The error I get is that the cache doesn't exist, which is strange because I'm using the same directory path from the example DNAnexus .json configuration file:

https://documentation.dnanexus.com/user/jupyter-notebooks/dxjupyterlab-spark-cluster

 

/cluster/vep exists but /home/dnanexus/dxjupyterlab-vep doesn't. I've tried to find dxjupyterlab-vep and it doesn't exist either. Is this a DNAnexus issue?

 

Here is the code I'm using to import the VCF and annotate. I've also pasted the .json configuration file:

 

Importing & annotating VCFs:

 

mt = hl.import_vcf(vcf_path, force_bgz=True, reference_genome='GRCh38')

 

annotated_mt = hl.vep(mt,"file:///mnt/project/Test_VEP_results/VEP_json_file_2022.04.12_v2.json", csq=True)

 

.json configuration file:

 

{"command": [

   "docker",

   "run",

   "-i",

   "-v",

   "/cluster/vep:/root/.vep",

   "dnanexus/dxjupyterlab-vep",

   "./vep",

   "--dir_cache", "/root/.vep/",

   "--fasta", "/root/.vep/homosapiens/103_GRCh38/Homosapiens.GRCh38.dna.toplevel.fa.gz",

   "--fork","60"

],

 "env": {

   "PERL5LIB": "/root/.vep/Plugins"

 },

 "vep_json_schema": "Struct{assembly_name:String,allele_string:String,ancestral:String,colocated_variants:Array[Struct{aa_allele:String,aa_maf:Float64,afr_allele:String,afr_maf:Float64,allele_string:String,amr_allele: String,amr_maf:Float64,clin_sig:Array[String],end:Int32,eas_allele:String,eas_maf:Float64,ea_allele:String,ea_maf:Float64,eur_allele:String,eur_maf:Float64,exac_adj_allele:String,exac_adj_maf:Float64,exac_allele:   String,exac_afr_allele:String,exac_afr_maf:Float64,exac_amr_allele:String,exac_amr_maf:Float64,exac_eas_allele:String,exac_eas_maf:Float64,exac_fin_allele:String,exac_fin_maf:Float64,exac_maf:Float64,exac_nfe_allele: String,exac_nfe_maf:Float64,exac_oth_allele:String,exac_oth_maf:Float64,exac_sas_allele:String,exac_sas_maf:Float64,id:String,minor_allele:String,minor_allele_freq:Float64,phenotype_or_disease:Int32,pubmed:      Array[Int32],sas_allele:String,sas_maf:Float64,somatic:Int32,start:Int32,strand:Int32}],context:String,end:Int32,id:String,input:String,intergenic_consequences:Array[Struct{allele_num:Int32,consequence_terms:     Array[String],impact:String,minimised:Int32,variant_allele:String}],most_severe_consequence:String,motif_feature_consequences:Array[Struct{allele_num:Int32,consequence_terms:Array[String],high_inf_pos:String,impact:  String,minimised:Int32,motif_feature_id:String,motif_name:String,motif_pos:Int32,motif_score_change:Float64,strand:Int32,variant_allele:String}],regulatory_feature_consequences:Array[Struct{allele_num:Int32,biotype:  String,consequence_terms:Array[String],impact:String,minimised:Int32,regulatory_feature_id:String,variant_allele:String}],seq_region_name:String,start:Int32,strand:Int32,transcript_consequences:            Array[Struct{allele_num:Int32,amino_acids:String,appris:String,biotype:String,canonical:Int32,ccds:String,cdna_start:Int32,cdna_end:Int32,cds_end:Int32,cds_start:Int32,codons:String,consequence_terms:Array[String],  distance:Int32,domains:Array[Struct{db:String,name:String}],exon:String,gene_id:String,gene_pheno:Int32,gene_symbol:String,gene_symbol_source:String,hgnc_id:String,hgvsc:String,hgvsp:String,hgvs_offset:Int32,impact:  String,intron:String,lof:String,lof_flags:String,lof_filter:String,lof_info:String,minimised:Int32,polyphen_prediction:String,polyphen_score:Float64,protein_end:Int32,protein_start:Int32,protein_id:String,       sift_prediction:String,sift_score:Float64,strand:Int32,swissprot:String,transcript_id:String,trembl:String,tsl:Int32,uniparc:String,variant_allele:String}],variant_class:String}"

 }

 

*Just to note, that is not the exact path to the fasta file. I can't post the actual path due to restrictions on language.

Comments

4 comments

  • Comment author
    Ondrej Klempir DNAnexus Team

    What was the number of nodes you started the cluster with and the exact error message?

    0
  • Hi Ondrej,

     

    I'm using three nodes. I got this error message on the python 3 console:

    Error summary: HailException: VEP command 'docker run -i -v /cluster/vep:/root/.vep dnanexus/dxjupyterlab-vep ./vep --dir_cache /mnt/project/root/.vep/ --fasta /root/.vep/homosapiens/103_GRCh38/homosapiens.GRCh38.dna.toplevel.fa.gz --fork 60' failed with non-zero exit status 2

     

    And there is this error message on the jupyterlab log file:

    Failed to fetch package metadata for 'jupyterlab_dx_extension': <HTTPError 404: 'Not Found'>

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    These types of issues might be caused by introducing some custom made settings to the vep.json file.

     

    https://documentation.dnanexus.com/user/jupyter-notebooks/dxjupyterlab-spark-cluster#using-vep-with-hail

    0
  • I figured out the issue with help from Matej. So yes the custom .json file was not correctly formatted but also the .json file needs to be stored in the root directory and not in a subfolder. Thanks for the help!

    0

Please sign in to leave a comment.