Should there be a pruning step in the REGENIE step1 workflow?

I noticed there is no pruning step in the b37 to grch38 liftover, nor in the regenie QC workflow on github.   The pruning step is mentioned in the nature paper, and LD pruning is commonly used to reduce data prior to calculating the PCAs or a GRM. I believe this would also reduce the amount of data written out after liftover.   Has anyone looked at whether adding ld pruning makes a difference in the GWAS output?        

Comments

2 comments

  • Comment author
    Anastazie Sedlakova DNAnexus Team

    In our internal experiment, we saw that pruning helps reduce the number of variants and thus helps reduce time spent for following analysis (e.g. liftOver and step1 of GWAS). The number of data (variants) left will depend on the thresholds set for pruning, so is the time and space saved. While it helps reduce data, you may also need to consider at which stage you want to prune (e.g. before or after liftOver) in terms of managing the flow.

     

    The question regarding how much pruning affects the GWAS result may require more experiments to determine. You may want to check on some papers that discuss this. We found that at SAIGE?s FAQ, they suggested number of markers need to be larger than the sample size (https://github.com/weizhouUMICH/SAIGE/wiki/Genetic-association-tests-using-SAIGE#Frequently-asked-questions, #4). The actual impact of it may vary depends on the GWAS method and the actual use cases.

    0
  • Comment author
    Former User of DNAx Community_15

    Very useful comment and link to the FAQ, thanks!

    0

Please sign in to leave a comment.