High missingness in chromosome 18 DRAGEN WGS

Thouis R Jones

When running QC with PLINK2 on DRAGEN WGS, I see a very large fraction of samples with high missingness, but only on chr18.  Mean missingness is around 14%, and 99% of all donors that I'm looking at have missingness above 10%.

Again, this is only on chr18.  Other chromosomes are fine.  Is this something I've messed up, or a known artifact?

Comments

6 comments

  • Comment author
    Thouis R Jones

    Looking deeper at the per-variant missingness rates, this appears to be partly from the large relative size of the centromeric region (7% of the chromosome length) and the number of variants in that region (25% of all variants).

    The default threshold in plink2 for --mind  of 0.1 is fine for the other chromosomes, but when running an association test on chr18, it rejects almost every donor.

    1
  • Comment author
    George F The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Dear Thouis,

    To reduce the chance of removing too many participants based on missingness, first filter the non-PASS variants before doing the sample level missingness filtering.

    Hope this helps

    George

    0
  • Comment author
    Thouis R Jones

    Thanks.  Unfortunately, PLINK2 filters donors first and then variants, so that requires creating a new set of variant files before running a GWAS.  (https://www.cog-genomics.org/plink/2.0/order)

    I'm going to try excluding the centromeric regions explicitly.  It looks like region filtering happens before --mind.

    0
  • Comment author
    Ahmet Sayici

    You can use the new ML-Corrected DRAGEN PVAR files. They have a column of FILTER. You can create either an exclude or an extract file by setting FILTER!=PASS or FILTER==PASS respectively. 

    0
  • Comment author
    Thouis R Jones

    Thanks for the idea, but there appear to be thousands of files in those directories which will take days to extract the variants.  Is there a pre-existing list of passing variants, or a faster way to extract them than I'm aware of?

    0
  • Comment author
    Ahmet Sayici

    I use PVAR files under `DRAGEN population level WGS variants, PLINK format [500k release]`. They are per chromosome. You can still find the FILTER column in those PVAR files. 

    0

Please sign in to leave a comment.