I have generated null models using GRM using saige using a quantitative trait and want to run gwas for multiple subsets. I want to understand which is the better version.
1.) Generating sparse grm ==> Generate Null model ==> GWAS?
2.) Generate Complete GRM ==> Generate Null model ==> GWAS?
Which is the better workflow among those? Also is it mandatory to keep the inverse normalization step for continuous traits or can use the phenotypes as is?
Please do let me know about this.
Regards
Akhil
I know that @Mike Tran? recently performed some nice SAIGE experiments, so maybe he will know. Since this seems to me more as a question for SAIGE team, you might also ask here: https://github.com/weizhouUMICH/SAIGE/issues
SAIGE/SAIGE-GENE has 2 steps: step 1 and step 2; step 0 is optional but would speed up the process by a significant amount. A Sparse GRM only needs to be created once for each dataset, and can be used for all different phenotypes as long as all tested samples are in the sparse GRM (https://saigegit.github.io/SAIGE-doc/docs/createSparseGRM.html).
Step 1:
User can provide a sparse GRM obtained from step 0 to be used to fit the null model (--useSparseGRMtoFitNULL=TRUE). Along with this, random markers need to be extracted from the input plink file.
A pre-computed full GRM is not needed, since it is calculated on-the-fly if sparse GRM is not used to fit the null model (--useSparseGRMtoFitNULL=FALSE), however, this process is very slow.
It seems that in the long run, generating the a sparse GRM would be beneficial in speeding up the computation time for step 1 and step 2. User can reuse the same sparse GRM to run GWAS on different phenotypes. The SAIGE-GENE article explains the use of sparse GRM to better approximate the variance score statistic in more details (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7871731/).
Using sparse GRM: step 1 took 11 seconds, step 2 took 15 seconds
Using full GRM: step 1 took 14959 seconds (>4 hours), step 2 took 8 seconds
I would recommend the first workflow: Generating sparse grm ==> Generate Null model ==> GWAS. And according to the tutorial, flag --invNormalize needs to be TRUE for quantitative traits.
Thank you so much. I have run saige based on step 1 and could able to complete the analysis. It took some time but finally could able to complete it and proceed with next steps.
Comments
3 comments
I know that @Mike Tran? recently performed some nice SAIGE experiments, so maybe he will know. Since this seems to me more as a question for SAIGE team, you might also ask here: https://github.com/weizhouUMICH/SAIGE/issues
Hi {@005t0000009cZYzAAM}?,
SAIGE/SAIGE-GENE has 2 steps: step 1 and step 2; step 0 is optional but would speed up the process by a significant amount. A Sparse GRM only needs to be created once for each dataset, and can be used for all different phenotypes as long as all tested samples are in the sparse GRM (https://saigegit.github.io/SAIGE-doc/docs/createSparseGRM.html).
Step 1:
It seems that in the long run, generating the a sparse GRM would be beneficial in speeding up the computation time for step 1 and step 2. User can reuse the same sparse GRM to run GWAS on different phenotypes. The SAIGE-GENE article explains the use of sparse GRM to better approximate the variance score statistic in more details (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7871731/).
I followed the author?s tutorial on running set-based test (https://saigegit.github.io/SAIGE-doc/docs/set_example.html), benchmarking between using sparse and full GRM for quantitative trait. I obtained the sample input and output data from the authors at https://github.com/weizhouUMICH/SAIGE/tree/master/extdata. All were done using docker image wzhou88/saige:1.1.6.3 on my MacBook with 16GB RAM.
I would recommend the first workflow: Generating sparse grm ==> Generate Null model ==> GWAS. And according to the tutorial, flag --invNormalize needs to be TRUE for quantitative traits.
I would also highly recommend contacting the author at https://github.com/weizhouUMICH/SAIGE/issues.
Thank you so much. I have run saige based on step 1 and could able to complete the analysis. It took some time but finally could able to complete it and proceed with next steps.
Please sign in to leave a comment.