Which Genotype Dataset in UK Biobank Is Most Comprehensive for PRS Calculation, and Can Training and Testing PRS Be Combined?
1. In the UKB bulk folder, there are multiple genotype datasets, such as Imputation, Genotype Results, and WGS (several). Which dataset contains the most comprehensive set of individuals (i.e., is the most suitable to use as a target for PRS calculation to estimate the genetic risk of UKB participants for a specific disease)?
2. UKB provides already some PRS for participants (some samples are used for training, while others are used for testing). Can these two types of PRS be combined to represent the genetic risk scores for all UKB individuals (e.g., to represent genetic risk for type 2 diabetes)?
1 comment
Hi Lu Ao,
The WGS data has the most comprehensive set of individuals. Please see Showcase Field 22418 , which says that 487921 participants have genotyping data, together with Field 32050 which says that 469574 participants have WES data, and Field 32069 which says that 490294 participants have WGS data. The imputation data is all based on the genotyping data, so it cannot have more individuals.
The WGS data also has the most comprehensive set of positions. However, the WGS data is very much more costly and difficult to work with. I cannot advise which is most suitable for PRS calculation.
Please see Category 300 , the publication by Thomson et al , and Resource 5202 , and then contact the address provided in the resource for any remaining questions about the PRS data.
Thank you for using the forum.
Please sign in to leave a comment.