Many sample-level WGS gVCFs lack mitochondrial variant calls in UKB
I'm working on a UK Biobank project that entails looking at joint-genotyped mitochondrial variants for my cohorts based on the gVCF format WGS variant calls (field 23191). I'm finding that a fairly large proportion, roughly 18% (on the order of 1000s of individuals), of participant chrM gVCFs have no variant calls and only the header. When I spot-check a few participant chrM gVCFs which have variant calls and a few that don't against their whole genome CRAM files (field 23193) in IGV, I do see reads mapped to chrM and distinct sites that look like they should be called variants.
I'm wondering if anybody knows why such a large number of UKB samples with WGS gVCFs would be so bereft of chrM variant calls? Could this be an issue with the data release? I haven't come across an explanation of why this might be in the UKB showcase, in these forums, or in some of the published methods describing the WGS processing (though I could definitely be missing something).
Here's a brief description of my procedure:
For those participants in my cohorts, I identify which ones have WGS gVCFs (field 23193) and use SAK's bcftools (bcftools view -r chrM -o ${outfile}) to pull out chrM. I then use bcftools stats to get number of records. Those chrM gVCFs that have variant records get joint-genotyped with GLNexus for downstream processing.
Thanks in advance for any advice or help in this issue.
Comments
4 comments
@UK Biobank DA Team?
If data is in the CRAM but not in the gVCF, that looks like some kind of QC filter. Can you see a similar loss of information from the CRAM to the gVCF in a nuclear chromosome?
I am not entirely sure that I have understood this correctly, so please check it yourself, but I think the Halldorsson paper says that the joint calls were made using GraphTyper with a parameter of AA-Score > 0.5.
Would this explain the missingness you are seeing?
Further update from our bioinformaticians:
For this query we can only recommend that the researcher either contacts the authors of the study (https://biobank.ndph.ox.ac.uk/ukb/label.cgi?id=180) or alternatively posts the question on the UKB-Genetics mailing list (https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKB-GENETICS).
Please sign in to leave a comment.