Hi everyone,
For our particular research project, we need to be able to call per-individual low frequency variants (i.e., as in from a non-diploid genome). For that, we tend to use lofreq, but it doesn't seem to be available through the Swiss Army Knife.
Is there any way to install lofreq itself ? Otherwise, would any of the already available tools provide a similar functionality?
Cheers,
Fran
c) If you have some working experience with Docker, I would try to save a docker lofreq snapshot on the platform and use it as a background docker image for Swiss Army Knife (-iimage option). DNAnexus will have Docker webinar coming soon and an example of creating docker snapshot for custom made tool (including Swiss Army Knife) will be shown and presented.
Many thanks, @Ondrej Klempir? . I will explore these options and see if we can make any work as expected. I might come back here to ask for further clarifications :)
0
Permanently deleted user
Just a quick update. To make the docker available online work properly, I needed to launch it as:
dx run app-swiss-army-knife -iimage=quay.io/biocontainers/lofreq:2.1.5--py310h8360dc1_7
Since there is no version marked as "latest", which SAK appears to expect by default
Mind you, you cannot just cut and paste all of this into a bash script and execute it. The iin file and the destination directory for the dx command need to exist. ${data_file_dir} is a a directory where I store some processed vcf files. If you fix the input and output files, and then run this as is, it will give a failure because it isn't actually creating any output files. If you look at the log file you will see that it has hactually downloaded the software, uncompressed it, and then run the executable diaplaying the help screen into standard out,
Best of luck
-Phil
0
Permanently deleted user
Thanks for you suggestion, @Phil Greer?
Would installing need to be done on every single instance for a batch computation? If so, is there any particular benefit compared with using a docker as suggested above?
Possibly for any batch processing, please take into consideration that DockerHub and other registries have a pull limit of 200 pulls/user/day. This is the reason I would recommend saving a docker image into a file in your DNAnexus project.
The dx submission scripts can be looped over the files you are running them upon. (ie. each chromosome file, or each individual).
If you look at the swiss -army-knife log file, you will see that each software packages is being downloaded and installed every time you start swiss army knife. The good thing about this is that if the software gets updated, you just have to point the script to the new tarball.
It really just depends on how much up-front work you want to do. It really won't make much difference in the overall runtime.
Comments
7 comments
Some possible solutions could be:
a) Run a web based terminal - ttyd (https://ukbiobank.dnanexus.com/app/ttyd) or Cloud Workstation app (https://ukbiobank.dnanexus.com/app/cloud_workstation). You can install the lofreq tool on the cloud worker, download data and process data in this interactive session.
b) You can build your own applet performing lofreq logic. Instructions on how to create an applet are reviewed here: https://www.youtube.com/watch?v=A_iki_50Ig0
c) If you have some working experience with Docker, I would try to save a docker lofreq snapshot on the platform and use it as a background docker image for Swiss Army Knife (-iimage option). DNAnexus will have Docker webinar coming soon and an example of creating docker snapshot for custom made tool (including Swiss Army Knife) will be shown and presented.
d) -iimage in SAK also can pull a publicly available docker image. I found a docker container for lofreq: https://quay.io/repository/biocontainers/lofreq
In SAK help (see the Bold part below):
$ dx run app-swiss-army-knife -h
usage: dx run app-swiss-army-knife [-iINPUT_NAME=VALUE ...]
App: Swiss Army Knife
A multi-purpose tool for all your basic analysis needs
See the app page for more information:
https://platform.dnanexus.com/app/swiss-army-knife
Inputs:
Input files: [-iin=(file) [-iin=... [...]]]
Command line: -icmd=(string)
Optional Docker image identifier: [-iimage=(string)]
Instead of using the default Ubuntu 14.04 environment, the input
command will be run using the specified Docker image as it would be
when running 'docker run image cmd'. Example images identifiers are
'ubuntu:16.04', 'quay.io/ucsc_cgl/samtools'.
Outputs:
Output files: [out (array:file)]
------------------------------------
So I would try to directly specify "https://quay.io/repository/biocontainers/lofreq" as the input image for SAK and then just specify the lofreq command you would like to run inside SAK.
Many thanks, @Ondrej Klempir? . I will explore these options and see if we can make any work as expected. I might come back here to ask for further clarifications :)
Just a quick update. To make the docker available online work properly, I needed to launch it as:
dx run app-swiss-army-knife -iimage=quay.io/biocontainers/lofreq:2.1.5--py310h8360dc1_7
Since there is no version marked as "latest", which SAK appears to expect by default
{@005t0000009gPQNAA2}?
If you are running swiss-army-knife via a dx run shell script like in the dnanexus gwas github repositories, you can simply add the install lines into your execution script. (see the scripts here: https://github.com/dnanexus/UKB_RAP/tree/main/GWAS/regenie_workflow )
To install lofreq or really any other precompiled binary software package, use these lines:
run_lofi_cmd="wget https://github.com/CSB5/lofreq/raw/master/dist/lofreq_star-2.1.5_linux-x86-64.tgz; \
tar zxvf lofreq_star-2.1.5_linux-x86-64.tgz; \
./lofreq_star-2.1.5_linux-x86-64/bin/lofreq "
dx run swiss-army-knife -iin="${data_file_dir}/WES_cX_qc_pass.vcf.gz" \
-icmd="${run_lofi_cmd}" --tag="LOFREQ" --instance-type "mem1_ssd1_v2_x16"\
--destination="${project}:/data/wes_lofreq/" --brief --yes
Mind you, you cannot just cut and paste all of this into a bash script and execute it. The iin file and the destination directory for the dx command need to exist. ${data_file_dir} is a a directory where I store some processed vcf files. If you fix the input and output files, and then run this as is, it will give a failure because it isn't actually creating any output files. If you look at the log file you will see that it has hactually downloaded the software, uncompressed it, and then run the executable diaplaying the help screen into standard out,
Best of luck
-Phil
Thanks for you suggestion, @Phil Greer?
Would installing need to be done on every single instance for a batch computation? If so, is there any particular benefit compared with using a docker as suggested above?
Cheers,
Fran
Awesome!
Possibly for any batch processing, please take into consideration that DockerHub and other registries have a pull limit of 200 pulls/user/day. This is the reason I would recommend saving a docker image into a file in your DNAnexus project.
The dx submission scripts can be looped over the files you are running them upon. (ie. each chromosome file, or each individual).
If you look at the swiss -army-knife log file, you will see that each software packages is being downloaded and installed every time you start swiss army knife. The good thing about this is that if the software gets updated, you just have to point the script to the new tarball.
It really just depends on how much up-front work you want to do. It really won't make much difference in the overall runtime.
Please sign in to leave a comment.