I used PJ Greer's scripts to perform GWAS, using my phenotype on UKB RAP, genotype array data, then the full final exome sequencing for step 2. I did (as far as I know) all the quality control steps, etc. My genotype array ended up with ~300,000 SNP - much fewer than the 500,000 SNPs to start. My end results, after MAF < .01 for Step 2, was about 150,000 SNPS. I ended up with some significant hits, and one that replicated a previous study, but I'm wondering - is that a reasonable number of SNPs at the end? It doesn't feel like enough compared to the 10 million I'm used to. Thanks for any thoughts!
My understanding is that you can experiment with all the QC parameters etc. and tune your GWAS accordingly. Also possibly validate hits when running several experiments with different settings. QC and other filtering can reduce Regenie runtime significantly. If you have enough resources, you might run regenie on full data, and test whether it behaves similarly like when you use QC etc. When running GWAS, there is another important step to consider - multiple test correction threshold - which is calculated based on number of input SNPs and might suppress some hits.
Thank you for this. Do you have any experience with using WES for Step 2 of Regenie versus imputed data? Will that make a difference? It seems to me that using exome data would focus more on exomes, leaving out introns, promoters, enhancer regions, etc., but I could be totally wrong.
Comments
2 comments
My understanding is that you can experiment with all the QC parameters etc. and tune your GWAS accordingly. Also possibly validate hits when running several experiments with different settings. QC and other filtering can reduce Regenie runtime significantly. If you have enough resources, you might run regenie on full data, and test whether it behaves similarly like when you use QC etc. When running GWAS, there is another important step to consider - multiple test correction threshold - which is calculated based on number of input SNPs and might suppress some hits.
Thank you for this. Do you have any experience with using WES for Step 2 of Regenie versus imputed data? Will that make a difference? It seems to me that using exome data would focus more on exomes, leaving out introns, promoters, enhancer regions, etc., but I could be totally wrong.
Please sign in to leave a comment.