Missing Loftee annotations when running with Hail
We've been trying to annotate variants from the WES data using VEP and Loftee via Hail in a Spark Cluster-Enabled DXJupyterLab. Our workflow is similar to the first 3 sections of this example notebook: https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/MatrixTable_variant_annotation_with_VEP.ipynb
The main difference is that instead of reading in a precomputed Hail MatrixTable, we get one by loading in a WES pVCF:
mt = hl.import_vcf( format_path_to_vcf_for_hail(path_to_pvcf), force_bgz=True, reference_genome="GRCh38", array_elements_required=False, )
We are using the example json shown in this page as a config: https://documentation.dnanexus.com/user/jupyter-notebooks/dxjupyterlab-spark-cluster
When we do so, the vast majority of variants that are annotate as High impact by VEP return "None" for Loftee despite these same variants being annotated as pLoF in Gnomad. Additionally when we run VEP + Loftee locally with a fresh install of VEP and Loftee (GRCh38 branch), this returns the same HC Loftee annotations for these variants. Any idea what might be cause?
This behaviour should replicable by using a small VCF with only this variant in it: 1-19657079-G-T(GRCh38)
Comments
1 comment
I had the same issue with missing LoFTEE annotations and was able to resolve it by adding some additional paths and plugin details to the .json file. The .json I used is below - hope it works for you too!
{"command": [
"docker", "run", "-i", "-v", "/cluster/vep:/root/.vep", "dnanexus/dxjupyterlab-vep",
"./vep", "--format", "vcf", "__OUTPUT_FORMAT_FLAG__", "--everything", "--allele_number",
"--no_stats", "--cache", "--offline", "--minimal", "--assembly", "GRCh38", "-o", "STDOUT",
"--dir_cache", "/root/.vep/","--dir_plugins", "/root/.vep/Plugins/loftee",
"--fasta", "/root/.vep/homo_sapiens/109_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz",
"--plugin", "LoF,loftee_path:/root/.vep/Plugins/loftee,human_ancestor_fa:/root/.vep/human_ancestor.fa,filter_position:0.05,min_intron_size:15,conservation_file:/root/.vep/loftee.sql,gerp_bigwig:/root/.vep/gerp_conservation_scores.homo_sapiens.GRCh38.bw"],
"env": {
"PERL5LIB": "/root/.vep/Plugins"
},
"vep_json_schema": "Struct{assembly_name:String,allele_string:String,ancestral:String,colocated_variants:Array[Struct{aa_allele:String,aa_maf:Float64,afr_allele:String,afr_maf:Float64,allele_string:String,amr_allele: String,amr_maf:Float64,clin_sig:Array[String],end:Int32,eas_allele:String,eas_maf:Float64,ea_allele:String,ea_maf:Float64,eur_allele:String,eur_maf:Float64,exac_adj_allele:String,exac_adj_maf:Float64,exac_allele: String,exac_afr_allele:String,exac_afr_maf:Float64,exac_amr_allele:String,exac_amr_maf:Float64,exac_eas_allele:String,exac_eas_maf:Float64,exac_fin_allele:String,exac_fin_maf:Float64,exac_maf:Float64,exac_nfe_allele: String,exac_nfe_maf:Float64,exac_oth_allele:String,exac_oth_maf:Float64,exac_sas_allele:String,exac_sas_maf:Float64,id:String,minor_allele:String,minor_allele_freq:Float64,phenotype_or_disease:Int32,pubmed: Array[Int32],sas_allele:String,sas_maf:Float64,somatic:Int32,start:Int32,strand:Int32}],context:String,end:Int32,id:String,input:String,intergenic_consequences:Array[Struct{allele_num:Int32,consequence_terms: Array[String],impact:String,minimised:Int32,variant_allele:String}],most_severe_consequence:String,motif_feature_consequences:Array[Struct{allele_num:Int32,consequence_terms:Array[String],high_inf_pos:String,impact: String,minimised:Int32,motif_feature_id:String,motif_name:String,motif_pos:Int32,motif_score_change:Float64,strand:Int32,variant_allele:String}],regulatory_feature_consequences:Array[Struct{allele_num:Int32,biotype: String,consequence_terms:Array[String],impact:String,minimised:Int32,regulatory_feature_id:String,variant_allele:String}],seq_region_name:String,start:Int32,strand:Int32,transcript_consequences: Array[Struct{allele_num:Int32,amino_acids:String,appris:String,biotype:String,canonical:Int32,ccds:String,cdna_start:Int32,cdna_end:Int32,cds_end:Int32,cds_start:Int32,codons:String,consequence_terms:Array[String], distance:Int32,domains:Array[Struct{db:String,name:String}],exon:String,gene_id:String,gene_pheno:Int32,gene_symbol:String,gene_symbol_source:String,hgnc_id:String,hgvsc:String,hgvsp:String,hgvs_offset:Int32,impact: String,intron:String,lof:String,lof_flags:String,lof_filter:String,lof_info:String,minimised:Int32,polyphen_prediction:String,polyphen_score:Float64,protein_end:Int32,protein_start:Int32,protein_id:String, sift_prediction:String,sift_score:Float64,strand:Int32,swissprot:String,transcript_id:String,trembl:String,tsl:Int32,uniparc:String,variant_allele:String}],variant_class:String}"
}
Please sign in to leave a comment.