Missing Loftee annotations when running with Hail

James Harrison

We've been trying to annotate variants from the WES data using VEP and Loftee via Hail in a Spark Cluster-Enabled DXJupyterLab. Our workflow is similar to the first 3 sections of this example notebook: https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/MatrixTable_variant_annotation_with_VEP.ipynb 

The main difference is that instead of reading in a precomputed Hail MatrixTable, we get one by loading in a WES pVCF: 
mt = hl.import_vcf( format_path_to_vcf_for_hail(path_to_pvcf), force_bgz=True, reference_genome="GRCh38", array_elements_required=False, ) 

We are using the example json shown in this page as a config: https://documentation.dnanexus.com/user/jupyter-notebooks/dxjupyterlab-spark-cluster 

When we do so, the vast majority of variants that are annotate as High impact by VEP return "None" for Loftee despite these same variants being annotated as pLoF in Gnomad. Additionally when we run VEP + Loftee locally with a fresh install of VEP and Loftee (GRCh38 branch), this returns the same HC Loftee annotations for these variants. Any idea what might be cause? 

This behaviour should replicable by using a small VCF with only this variant in it: 1-19657079-G-T(GRCh38)

Comments

1 comment

  • Comment author
    eilidh

    I had the same issue with missing LoFTEE annotations and was able to resolve it by adding some additional paths and plugin details to the .json file. The .json I used is below - hope it works for you too! 

    {"command": [

         "docker", "run", "-i", "-v", "/cluster/vep:/root/.vep", "dnanexus/dxjupyterlab-vep",

         "./vep", "--format", "vcf", "__OUTPUT_FORMAT_FLAG__", "--everything", "--allele_number",

         "--no_stats", "--cache", "--offline", "--minimal", "--assembly", "GRCh38", "-o", "STDOUT",

        "--dir_cache", "/root/.vep/","--dir_plugins", "/root/.vep/Plugins/loftee",

         "--fasta", "/root/.vep/homo_sapiens/109_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz",

        "--plugin", "LoF,loftee_path:/root/.vep/Plugins/loftee,human_ancestor_fa:/root/.vep/human_ancestor.fa,filter_position:0.05,min_intron_size:15,conservation_file:/root/.vep/loftee.sql,gerp_bigwig:/root/.vep/gerp_conservation_scores.homo_sapiens.GRCh38.bw"],

      "env": {

          "PERL5LIB": "/root/.vep/Plugins"

      },

      "vep_json_schema": "Struct{assembly_name:String,allele_string:String,ancestral:String,colocated_variants:Array[Struct{aa_allele:String,aa_maf:Float64,afr_allele:String,afr_maf:Float64,allele_string:String,amr_allele: String,amr_maf:Float64,clin_sig:Array[String],end:Int32,eas_allele:String,eas_maf:Float64,ea_allele:String,ea_maf:Float64,eur_allele:String,eur_maf:Float64,exac_adj_allele:String,exac_adj_maf:Float64,exac_allele:      String,exac_afr_allele:String,exac_afr_maf:Float64,exac_amr_allele:String,exac_amr_maf:Float64,exac_eas_allele:String,exac_eas_maf:Float64,exac_fin_allele:String,exac_fin_maf:Float64,exac_maf:Float64,exac_nfe_allele:  String,exac_nfe_maf:Float64,exac_oth_allele:String,exac_oth_maf:Float64,exac_sas_allele:String,exac_sas_maf:Float64,id:String,minor_allele:String,minor_allele_freq:Float64,phenotype_or_disease:Int32,pubmed:            Array[Int32],sas_allele:String,sas_maf:Float64,somatic:Int32,start:Int32,strand:Int32}],context:String,end:Int32,id:String,input:String,intergenic_consequences:Array[Struct{allele_num:Int32,consequence_terms:          Array[String],impact:String,minimised:Int32,variant_allele:String}],most_severe_consequence:String,motif_feature_consequences:Array[Struct{allele_num:Int32,consequence_terms:Array[String],high_inf_pos:String,impact:   String,minimised:Int32,motif_feature_id:String,motif_name:String,motif_pos:Int32,motif_score_change:Float64,strand:Int32,variant_allele:String}],regulatory_feature_consequences:Array[Struct{allele_num:Int32,biotype:   String,consequence_terms:Array[String],impact:String,minimised:Int32,regulatory_feature_id:String,variant_allele:String}],seq_region_name:String,start:Int32,strand:Int32,transcript_consequences:                        Array[Struct{allele_num:Int32,amino_acids:String,appris:String,biotype:String,canonical:Int32,ccds:String,cdna_start:Int32,cdna_end:Int32,cds_end:Int32,cds_start:Int32,codons:String,consequence_terms:Array[String],    distance:Int32,domains:Array[Struct{db:String,name:String}],exon:String,gene_id:String,gene_pheno:Int32,gene_symbol:String,gene_symbol_source:String,hgnc_id:String,hgvsc:String,hgvsp:String,hgvs_offset:Int32,impact:   String,intron:String,lof:String,lof_flags:String,lof_filter:String,lof_info:String,minimised:Int32,polyphen_prediction:String,polyphen_score:Float64,protein_end:Int32,protein_start:Int32,protein_id:String,             sift_prediction:String,sift_score:Float64,strand:Int32,swissprot:String,transcript_id:String,trembl:String,tsl:Int32,uniparc:String,variant_allele:String}],variant_class:String}"

    }

    0

Please sign in to leave a comment.