Best way to perform Hardy-Weinberg Equilibrium filtering on WES pVCF on the RAP? Can I rely on the 'Ethnic Background' (Data-Field 21000) data field for delineation of populations?

Permanently deleted user

To perform HWE filtering I need to subset variants by population/ancestry, however it is a bit unclear to me on how to do this on the UKB. The reference paper (https://www.nature.com/articles/s41586-021-04103-z) talks about using array data to project samples on HapMap super population coordinates on PCA to identify sample ancestors, but this approach does not seem very straightforward to me.

 

The way I was thinking of approaching the problem is to rely on the population information provided by UKB. The best one I could find for this was 'Ethnic Background' (Data-Field 21000) . I also noticed another data field of relevance being "Genetic Ethnic Grouping" (Data-Field 22006) indicating participants who self-identified as 'White British' and had very similar genetic ancestry based on a principal components analysis of the genotypes.

 

So I guess I have two questions:

  1. Would the Ethnic Background field be valid for separating populations to perform HWE filtering?
  2. Would it be more advantageous to limit my study to just those who are included in the 'Genetic Ethnic Grouping' flag?

Comments

1 comment

Please sign in to leave a comment.