WDL workflow and bash/R outputs in DNAnexus?
I have a WDL workflow that has a task to generate Manhattan plots using an Rscript and a docker container.
```
# QQ and Manhattan plots
task Plots {
input {
Array[String] phenotype_names # Format: "<phenotype_name>.regenie". Names of the files produced in join_Output.
Array[Int] chr_list
Array[File] file_input
String folder_name
# Runtime
String docker
}
Float regenie_files_size = size(file_input, "GiB")
# Plots are produced for each phenotype.
# For each phenotype, a file containing all of the hits from Step 2 is output.
# For each phenotype, a file containing a subset of all of the hits where "-LOG10P > 1.3" from Step 2 is output.
command <<<
set -euo pipefail
for file in ~{sep=' ' file_input}; do \
awk '$12 > 1.3' $file >> ${file%.regenie}_subset.regenie; \
chmod 777 * \
mv ${file%.regenie}_subset.regenie .; \
mv $file .; \
done
R --no-save --args ~{folder_name} ~{sep=' ' file_input} <<RSCRIPT
library('data.table')
library('qqman')
args <- commandArgs(trailingOnly = TRUE)
# This indicates where the plots are going to be stored
output_dir <- args[1]
print(output_dir)
# Now we provide the filenames
file_paths <- args[2:(length(args))]
print(file_paths)
for (file in file_paths) {
print(file)
regenie_output <- fread(file)
regenie_ADD_subset <-subset.data.frame(regenie_output, TEST=="ADD")
regenie_ADD_subset[,"CHROM"] <-as.numeric(unlist(regenie_ADD_subset[,"CHROM"]))
regenie_ADD_subset[,"LOG10P"] <-as.numeric(unlist(regenie_ADD_subset[,"LOG10P"]))
regenie_ADD_subset[,"GENPOS"] <-as.numeric(unlist(regenie_ADD_subset[,"GENPOS"]))
qq_plot = substr(file,1,nchar(file)-8)
qq_plot = paste0(output_dir,"/", qq_plot,"_", "qqplot.png")
print(qq_plot)
png(qq_plot, width = 6, height = 4, unit='in', res=300)
p = 10 ^ (-1 * (as.numeric(unlist(regenie_ADD_subset[,"LOG10P"]))))
print(qq(p))
dev.off()
manhattan_plot = substr(file,1,nchar(file)-8)
manhattan_plot = paste0(output_dir,"/",manhattan_plot,"_", "manhattan.png")
print(manhattan_plot)
png(manhattan_plot, width = 6, height = 4, unit='in', res=300)
print(manhattan(regenie_ADD_subset, chr="CHROM", bp="GENPOS", snp="ID", p="LOG10P", logp=FALSE, annotatePval = 5E-8))
dev.off()
}
RSCRIPT
>>>
output {
Array[File] output_plots = glob("*.png")
Array[File] output_regenie = glob("*.regenie")
}
runtime {
dx_instance_type: dx_instance_type
docker: docker
dx_timeout: '24H'
}
parameter_meta {
file_input: {
help: "The join files from the previous task",
patterns: [".regenie"],
stream: true,
localization_optional:true
}
}
}
```
However, I'm getting an error when I run it in the RAP platform, stating that the files from this step
set -euo pipefail
for file in ~{sep=' ' file_input}; do \
awk '$12 > 1.3' $file >> ${file%.regenie}_subset.regenie; \
mv ${file%.regenie}_subset.regenie .; \
mv $file .; \
done
are read only
[31m[error] failure executing Task action 'run'
java.lang.Exception: job script function run_command exited with permanent fail code 1
/home/dnanexus/meta/commandScript: line 6: /home/dnanexus/mnt/inputbe9e832f-bbef-4034-adf7-84c0371da0b0/filename_subset.regenie: Read-only file system
How can I do so that DNAnexus can have access to this file and for the workflow to work??
Any help is appreciated.
Thank you
Comments
2 comments
Would it work to copy the files instead of moving them?
My idea
I am seeing "stream: true" in the parameter_meta spec for input files. Mounting/Streaming files is typically done via dxfuse, and that is indeed a Read-only file system. In the mv etc. operations part I would try to avoid using the prefix like "/home/dnanexus/mnt/input......." and rather try some other hardcoded prefix and test whether this would work.
Please sign in to leave a comment.