Many sample-level WGS gVCFs lack mitochondrial variant calls in UKB

I'm working on a UK Biobank project that entails looking at joint-genotyped mitochondrial variants for my cohorts based on the gVCF format WGS variant calls (field 23191). I'm finding that a fairly large proportion, roughly 18% (on the order of 1000s of individuals), of participant chrM gVCFs have no variant calls and only the header. When I spot-check a few participant chrM gVCFs which have variant calls and a few that don't against their whole genome CRAM files (field 23193) in IGV, I do see reads mapped to chrM and distinct sites that look like they should be called variants.

 

I'm wondering if anybody knows why such a large number of UKB samples with WGS gVCFs would be so bereft of chrM variant calls? Could this be an issue with the data release? I haven't come across an explanation of why this might be in the UKB showcase, in these forums, or in some of the published methods describing the WGS processing (though I could definitely be missing something).

 

Here's a brief description of my procedure:

For those participants in my cohorts, I identify which ones have WGS gVCFs (field 23193) and use SAK's bcftools (bcftools view -r chrM -o ${outfile}) to pull out chrM. I then use bcftools stats to get number of records. Those chrM gVCFs that have variant records get joint-genotyped with GLNexus for downstream processing.

 

Thanks in advance for any advice or help in this issue.

 

Comments

4 comments

  • Comment author
    Chai Fungtammasan DNAnexus Team

    @UK Biobank DA Team? 

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    If data is in the CRAM but not in the gVCF, that looks like some kind of QC filter. Can you see a similar loss of information from the CRAM to the gVCF in a nuclear chromosome?

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    I am not entirely sure that I have understood this correctly, so please check it yourself, but I think the Halldorsson paper says that the joint calls were made using GraphTyper with a parameter of AA-Score > 0.5.

    Would this explain the missingness you are seeing?

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Further update from our bioinformaticians:

    For this query we can only recommend that the researcher either contacts the authors of the study (https://biobank.ndph.ox.ac.uk/ukb/label.cgi?id=180) or alternatively posts the question on the UKB-Genetics mailing list (https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKB-GENETICS).

    0

Please sign in to leave a comment.