High missingness in chromosome 18 DRAGEN WGS
When running QC with PLINK2 on DRAGEN WGS, I see a very large fraction of samples with high missingness, but only on chr18. Mean missingness is around 14%, and 99% of all donors that I'm looking at have missingness above 10%.
Again, this is only on chr18. Other chromosomes are fine. Is this something I've messed up, or a known artifact?
Comments
6 comments
Looking deeper at the per-variant missingness rates, this appears to be partly from the large relative size of the centromeric region (7% of the chromosome length) and the number of variants in that region (25% of all variants).
The default threshold in plink2 for --mind of 0.1 is fine for the other chromosomes, but when running an association test on chr18, it rejects almost every donor.
Dear Thouis,
To reduce the chance of removing too many participants based on missingness, first filter the non-PASS variants before doing the sample level missingness filtering.
Hope this helps
George
Thanks. Unfortunately, PLINK2 filters donors first and then variants, so that requires creating a new set of variant files before running a GWAS. (https://www.cog-genomics.org/plink/2.0/order)
I'm going to try excluding the centromeric regions explicitly. It looks like region filtering happens before --mind.
You can use the new ML-Corrected DRAGEN PVAR files. They have a column of FILTER. You can create either an exclude or an extract file by setting FILTER!=PASS or FILTER==PASS respectively.
Thanks for the idea, but there appear to be thousands of files in those directories which will take days to extract the variants. Is there a pre-existing list of passing variants, or a faster way to extract them than I'm aware of?
I use PVAR files under `DRAGEN population level WGS variants, PLINK format [500k release]`. They are per chromosome. You can still find the FILTER column in those PVAR files.
Please sign in to leave a comment.