Hello, I have a question regarding saige

21 March 2023 00:00
3 comments

I have generated null models using GRM using saige using a quantitative trait and want to run gwas for multiple subsets. I want to understand which is the better version. 1.) Generating sparse grm ==> Generate Null model ==> GWAS? 2.) Generate Complete GRM ==> Generate Null model ==> GWAS? Which is the better workflow among those? Also is it mandatory to keep the inverse normalization step for continuous traits or can use the phenotypes as is? Please do let me know about this. Regards Akhil

Comments

3 comments

Ondrej Klempir DNAnexus Team
- 28 March 2023 03:01
I know that @Mike Tran? recently performed some nice SAIGE experiments, so maybe he will know. Since this seems to me more as a question for SAIGE team, you might also ask here: https://github.com/weizhouUMICH/SAIGE/issues

0
Former User of DNAx Community_73
- 28 March 2023 14:40
Hi {@005t0000009cZYzAAM}?,

SAIGE/SAIGE-GENE has 2 steps: step 1 and step 2; step 0 is optional but would speed up the process by a significant amount. A Sparse GRM only needs to be created once for each dataset, and can be used for all different phenotypes as long as all tested samples are in the sparse GRM (https://saigegit.github.io/SAIGE-doc/docs/createSparseGRM.html).

Step 1:
- User can provide a sparse GRM obtained from step 0 to be used to fit the null model (--useSparseGRMtoFitNULL=TRUE). Along with this, random markers need to be extracted from the input plink file.
- A pre-computed full GRM is not needed, since it is calculated on-the-fly if sparse GRM is not used to fit the null model (--useSparseGRMtoFitNULL=FALSE), however, this process is very slow.
- Even when the sparse GRM is not used to fit null model, it can still be used to estimate the variance ratio(s) (--useSparseGRMforVarRatio=TRUE; https://saigegit.github.io/SAIGE-doc/docs/set_step1.html).
- For quantitative trait, a null linear mixed model is used (--traitType=quantitative), and needs to be inverse normalized (--invNormalize=TRUE; https://saigegit.github.io/SAIGE-doc/docs/single_step1.html#for-quantitative-traits-a-null-linear-mixed-model-will-be-fitted-traittypequantitative-and-needs-to-be-inverse-normalized-invnormalizetrue-).
It seems that in the long run, generating the a sparse GRM would be beneficial in speeding up the computation time for step 1 and step 2. User can reuse the same sparse GRM to run GWAS on different phenotypes. The SAIGE-GENE article explains the use of sparse GRM to better approximate the variance score statistic in more details (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7871731/).

I followed the author?s tutorial on running set-based test (https://saigegit.github.io/SAIGE-doc/docs/set_example.html), benchmarking between using sparse and full GRM for quantitative trait. I obtained the sample input and output data from the authors at https://github.com/weizhouUMICH/SAIGE/tree/master/extdata. All were done using docker image wzhou88/saige:1.1.6.3 on my MacBook with 16GB RAM.
- Using sparse GRM: step 1 took 11 seconds, step 2 took 15 seconds
- Using full GRM: step 1 took 14959 seconds (>4 hours), step 2 took 8 seconds
I would recommend the first workflow: Generating sparse grm ==> Generate Null model ==> GWAS. And according to the tutorial, flag --invNormalize needs to be TRUE for quantitative traits.

I would also highly recommend contacting the author at https://github.com/weizhouUMICH/SAIGE/issues.
0
Former User of DNAx Community_6
- 28 March 2023 20:49
Thank you so much. I have run saige based on step 1 and could able to complete the analysis. It took some time but finally could able to complete it and proceed with next steps.

0

Please sign in to leave a comment.