Webinar Nov 2: GWAS on the Research Analysis Platform using regenie

Brenton Pyle DNAnexus Team

15 October 2021 00:00
16 comments

An introduction to running GWAS using Regenie on the Research Analysis Platform. We will demonstrate how to run the analysis using a diabetes phenotype on the 300k data.

Date: Tuesday November 2, 3pm GMT/8am PDT

Topics include:

Genomic file preprocessing and filtering using the Swiss Army Knife App
Building phenotype/covariate files for cohorts using Spark JupyterLab
Running regenie using the Swiss Army Knife App

Prior to the session, it is helpful to have an understanding of how to use the Research Analysis Platform. You can find our previous sessions covering it below:

Research Analysis Platform Overview: https://www.youtube.com/watch?v=7uoiv2N5YPc
Introduction to Jupyter Notebooks: https://www.youtube.com/watch?v=Ib4kl1VBbVY
Exploring and Analyzing Jupyter Notebooks: https://www.youtube.com/watch?v=NX1Czn0_fr8

Comments

16 comments

Brenton Pyle DNAnexus Team
- 03 November 2021 15:49
You can now view the session on demand: https://youtu.be/762PVlyZJ-U

We have also put together a github repository with the regenie workflow: https://github.com/dnanexus/UKB_RAP/tree/main/GWAS

Presentation slides are also attached below.

You can also learn more about REGENIE by viewing this paper referenced in the presentation: https://www.nature.com/articles/s41588-021-00870-7

0
Former User of DNAx Community_39
- 21 June 2022 08:26
Hi looks like the PDF link is broken. Could you update?

0
Brenton Pyle DNAnexus Team
- 21 June 2022 15:32
Updated slides are now attached

0
Brenton Pyle DNAnexus Team
- 21 June 2022 15:33
Thank you for bringing this to my attention! Updated slides are below.

0
Former User of DNAx Community_39
- 22 June 2022 05:24
Thanks!

0
Former User of DNAx Community_13
- 30 June 2022 19:31
How long should one expect a job performing step B (merging the input files, slide 51) to take? I'm currently running this on the 450k release and it's been running for ~23 hours, so just curious.

0
Chai Fungtammasan DNAnexus Team
- 03 October 2022 16:29
For people who looks for liftover guideline, please see the code and blog post here.

https://github.com/dnanexus-rnd/liftover_plink_beds
https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/gwas-ex

0
Former User of DNAx Community_91
- 21 December 2022 18:21
Is there a way to specify multiple inputs with a wildcard? This is necessary when running multiple phenotypes at once, hence the generation of numerous loco.gz files. So far using a wildcard has not worked?

0
Former User of DNAx Community_28
- 22 December 2022 10:04
Just a heads up.

While the instructions state clearly that you need to run liftover from grch37 to grch38 on the array data on page 45 of the slide deck. The merge command is clearly using the grch37 data. That means if you use the instructions as written, then the loco and pred files from part D will not be compatible with part F where they are needed.

Perhaps a tutorial on runnnin the liftover workflow might be helpful since everyone will have to run it in order to run a GWAS.

0
Alexandra Lee DNAnexus Team
- 09 January 2023 17:03
Are you asking about how to run REGENIE across multiple phenotypes? Using the current version of regenie within swiss-army-knife, I think you can use the `phenoColList` or `phenoExcludeList` if you want to specify multiple phenotype traits: https://rgcgithub.github.io/regenie/options/

Please let me know if I'm misunderstanding that you mean by "multiple inputs" here.

0
Former User of DNAx Community_91
- 13 January 2023 15:19
Thank you for following up! I have run REGENIE step1 using swiss-army-knife and that went fine, but for step 2, you need to specify the output of step 1, which is a file per phenotype. I am struggling to figure out if there is a way to use the -iin command of dx run with the wildcard so as not to have to specify each by name and therefore allow my script to be reusable. Hope this makes sense, let me know if you need more clarification!

0
Ondrej Klempir DNAnexus Team
- 18 January 2023 14:02
I would try the following example command:

dx run app-swiss-army-knife `dx ls \*.loco.gz | sed 's/^/-iin=/'` -iin="regenie_step1_pred.list" -iin="pheno_file.phe" -iin="input.bgen" -iin="covar_file.phe" -iin="input.sample" -icmd="regenie --step 2 --bgen input.bgen --ref-first --phenoFile pheno_file.phe --covarFile covar_file.phe --sample input.sample --minMAC 200 --minINFO 0.3 --pred regenie_step1_pred.list --bsize 400 --bt --firth --approx --out step2_chr1;" --name "REGENIE-step2_chr1" --instance-type mem2_ssd1_v2_x8

The part `dx ls \*.loco.gz | sed 's/^/-iin=/'` does expansion in order to get all the loco files.

In addition, if you want, you may want to first save the results from `dx ls \*.loco.gz | sed 's/^/-iin=/'` into a variable, e.g. LIST_OF_REGENIE_INPUTS. And then use content of this variable in the command using $(LIST_OF_REGENIE_INPUTS).

something like

dx run app-swiss-army-knife $(LIST_OF_REGENIE_INPUTS) -iin="regenie_step1_pred.list" ...

NOTE: This code assumes that output of step1 is compressed loco files (loco.gz).

0
Former User of DNAx Community_92
- 23 March 2023 18:11
May I ask for the decks for the other two youtube videos too? Thx a lot!

0
Brenton Pyle DNAnexus Team
- 23 March 2023 19:50
You can find them here:
- Overview: https://community.dnanexus.com/s/feed/0D5t000004DD30SCAT
- Intro to Jupyter: https://community.dnanexus.com/s/question/0D5t000003PKqoZCAT/webinar-oct-18-introduction-to-jupyter-notebooks-on-rap
- Exploring & Analyzing: https://community.dnanexus.com/s/question/0D5t000003PKrXNCA1/webinar-oct-25-exploring-and-analyzing-uk-biobank-data-with-jupyter-notebooks
0
Former User of DNAx Community_92
- 23 March 2023 20:04
Great! Appreciate the help!

0
Xinwu Lu
- 16 October 2025 12:42
Where are the presentation slides?

0

Please sign in to leave a comment.