Webinar Nov 2: GWAS on the Research Analysis Platform using regenie
An introduction to running GWAS using Regenie on the Research Analysis Platform. We will demonstrate how to run the analysis using a diabetes phenotype on the 300k data.
Date: Tuesday November 2, 3pm GMT/8am PDT
Topics include:
- Genomic file preprocessing and filtering using the Swiss Army Knife App
- Building phenotype/covariate files for cohorts using Spark JupyterLab
- Running regenie using the Swiss Army Knife App
Prior to the session, it is helpful to have an understanding of how to use the Research Analysis Platform. You can find our previous sessions covering it below:
- Research Analysis Platform Overview: https://www.youtube.com/watch?v=7uoiv2N5YPc
- Introduction to Jupyter Notebooks: https://www.youtube.com/watch?v=Ib4kl1VBbVY
- Exploring and Analyzing Jupyter Notebooks: https://www.youtube.com/watch?v=NX1Czn0_fr8
Comments
15 comments
You can now view the session on demand: https://youtu.be/762PVlyZJ-U
We have also put together a github repository with the regenie workflow: https://github.com/dnanexus/UKB_RAP/tree/main/GWAS
Presentation slides are also attached below.
You can also learn more about REGENIE by viewing this paper referenced in the presentation: https://www.nature.com/articles/s41588-021-00870-7
Hi looks like the PDF link is broken. Could you update?
Updated slides are now attached
Thank you for bringing this to my attention! Updated slides are below.
Thanks!
How long should one expect a job performing step B (merging the input files, slide 51) to take? I'm currently running this on the 450k release and it's been running for ~23 hours, so just curious.
For people who looks for liftover guideline, please see the code and blog post here.
https://github.com/dnanexus-rnd/liftover_plink_beds
https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/gwas-ex
Is there a way to specify multiple inputs with a wildcard? This is necessary when running multiple phenotypes at once, hence the generation of numerous loco.gz files. So far using a wildcard has not worked?
Just a heads up.
While the instructions state clearly that you need to run liftover from grch37 to grch38 on the array data on page 45 of the slide deck. The merge command is clearly using the grch37 data. That means if you use the instructions as written, then the loco and pred files from part D will not be compatible with part F where they are needed.
Perhaps a tutorial on runnnin the liftover workflow might be helpful since everyone will have to run it in order to run a GWAS.
Are you asking about how to run REGENIE across multiple phenotypes? Using the current version of regenie within swiss-army-knife, I think you can use the `phenoColList` or `phenoExcludeList` if you want to specify multiple phenotype traits: https://rgcgithub.github.io/regenie/options/
Please let me know if I'm misunderstanding that you mean by "multiple inputs" here.
Thank you for following up! I have run REGENIE step1 using swiss-army-knife and that went fine, but for step 2, you need to specify the output of step 1, which is a file per phenotype. I am struggling to figure out if there is a way to use the -iin command of dx run with the wildcard so as not to have to specify each by name and therefore allow my script to be reusable. Hope this makes sense, let me know if you need more clarification!
I would try the following example command:
dx run app-swiss-army-knife `dx ls \*.loco.gz | sed 's/^/-iin=/'` -iin="regenie_step1_pred.list" -iin="pheno_file.phe" -iin="input.bgen" -iin="covar_file.phe" -iin="input.sample" -icmd="regenie --step 2 --bgen input.bgen --ref-first --phenoFile pheno_file.phe --covarFile covar_file.phe --sample input.sample --minMAC 200 --minINFO 0.3 --pred regenie_step1_pred.list --bsize 400 --bt --firth --approx --out step2_chr1;" --name "REGENIE-step2_chr1" --instance-type mem2_ssd1_v2_x8
The part `dx ls \*.loco.gz | sed 's/^/-iin=/'` does expansion in order to get all the loco files.
In addition, if you want, you may want to first save the results from `dx ls \*.loco.gz | sed 's/^/-iin=/'` into a variable, e.g. LIST_OF_REGENIE_INPUTS. And then use content of this variable in the command using $(LIST_OF_REGENIE_INPUTS).
something like
dx run app-swiss-army-knife $(LIST_OF_REGENIE_INPUTS) -iin="regenie_step1_pred.list" ...
NOTE: This code assumes that output of step1 is compressed loco files (loco.gz).
May I ask for the decks for the other two youtube videos too? Thx a lot!
You can find them here:
Great! Appreciate the help!
Please sign in to leave a comment.