This article is a guide on how to use the RStudio Posit Workbench on the UK Biobank Research Analysis Platform (UKB-RAP). It will help you learn how to:
- Work with UKB-RAP project files inside RStudio Workbench
- Manage libraries in RStudio Workbench reproducibly
- Work with phenotype data in R
- Utilizing the dx Toolkit and dxFUSE for File Transfer
- Save and restore RStudio projects for UKB project storage
- Utilize CRAN for reproducible R package management within an RStudio project
- Utilize Docker container using TTYD
How to access Rstudio
Pull up an RStudio Workbench instance by navigating to Post Workbench (RStudio) under the tools tab.
Start a new RStudio Workbench workstation by clicking on the "New Workbench" button.
Specify the project you wish to have access to in your RStudio instance. This selection will determine the files you'll have access to.
- Instance type: determines the number of GPUs or CPUs, the amount of memory, and the amount of storage. For more information about instance types, see this page.
- Select "Start Environment"
- When your RStudio instance is ready, you will be able to click the ‘Open’ link provide
Working with phenotypic data and bulk files
What is dx toolkit?
- dx toolkit is a set of tools for command line interface or system function calls in R scripts
- dx toolkit is installed by default in your RStudio workbench. You can use the terminal window to access dx toolkit.
- The important commands:
- dx download to transfer files from your project in your workstation
- dx upload to transfer files from your workstation to your project storage
- dx-backup-folder, dx-restore-folder to save/restore RStudio project folders and package
For a further list of commands, see Index of commands page.
Import files into Rstudio
You can access the files from your project by typing in your terminal: dx download <file_name>
Using dx download with a bulk file this may look something like this:
dx download "/Bulk/Exome sequences/Population level exome OQFE variants, pVCF format - interim 450k release/ukb12345_c1_b0_v1.vcf.gz.tbi"
You can also transfer bulk files into your Studio workstation by using dxFUSE (via 'mnt/project/Bulk/file/path/here'. Read more).
Phenotypic data files need to be created before they can begin to be worked with in RStudio. An example of how to extract phenotypic data can be found in the A110 UKB Github notebook. You can also use the table exporter app, which you can read more about here.
Working reproducibly in RStudio
When you conduct analysis on RStudio using your local computer, your scripts, data, and results will be stored on your local computer. You provide the software environment, and can control everything within that environment. However, when using cloud-based analysis, scripts, data, and results are transferred to a temporary worker in the cloud. This means that any files you generate will disappear once your session ends. You will need to transfer any results you wish to keep back to UKB-RAP project storage.
You can use renv for reproducible library management.
- Save and restore projects using dx-backup-folder and dx-restore-folder.
- Note: if you wish to upload a single file to your UKB-RAP project, you can use:
dx upload <file_name>
For example:
What is renv?
- It encapsulates installed packages and dependencies tied to an RStudio project
- First: make a project in Rstudio- this is what your packages and dependencies will be tied to
- Install renv via your RStudio console
install.packages("renv")
- Initialise renv
renv::init()
- Install your packages, download your files of interest, and conduct your analysis. e.g.
install.packages("tidyverse")
- Generate a snapshot of your project
renv::snapshot()
- Back up your current project folder in R to your UKB-RAP project in a folder:
“.Backups” folder.dx-backup-folder -d /.Backups/snap_rstudio.tar.gz
- The next time you wish to restore your RStudio project, you can type in your terminal
dx-restore-folder /.Backups/snap_rstudio.tar.gz my_rstudio_project
- If you would like to install more packages after you have restored your project, use:
Renv::activate()
Before you begin, and:
Renv::snapshot()
When you are done.
Remember to terminate the session when done either directly in the RStudio application or in the monitor section by using the terminate button. If you forget this step, you can risk unnecessarily spending more money running your instance.
Please check out our UKB Github for more information on how you can use RStudio to explore UKB data. You can find the UKB GitHub here, or alternatively read more about what these notebooks are in this article.
So, to summarise:
- Start RStudio Workbench using tools on the UKB-RAP
- Transfer data using dx download or dxFUSE
- Use renv to save software dependencies
- Use dx-backup-folder and dx-restore-folder for project management
- Don't forget to terminate your instance when done
Comments
0 comments
Please sign in to leave a comment.