WDL: how to specify a directory in the JSON input.
My previous question (https://community.dnanexus.com/s/question/0D5t000004RYhxFCAT/wdl-specify-directory-as-input) was about the WDL syntax itself.
Now I want to specify a **directory** in the json-formatted file describing the inputs for WDL.
I understand that a plain file should be specified as a JSON object containing the file-id; Something like:
```
"stage-common.myinput":{
"$dnanexus_link":{
"project":"project-AZIYDNAZDINZDAIDN",
"id":"file-12345"
}
},
(...)
```
but what about the **directories** ? how can I get (if any) the ID of a directory ? I cannot find this in the documentation.
I tried
```
dx describe "the/path/to/dir"
dxpy.exceptions.DXCLIError: No matches found for the/path/to/dir
```
Thanks,
Pierre
Comments
8 comments
Would specifying it as a String in dnax format work? Something like:
{
"myWorkflowName.stepA.dir_name": "String"
}
~
{
"stage-common.myinput": "dx://project-xyz:path/to/folder/"
}
Based on this example,
https://github.com/dnanexus/dxCompiler/blob/develop/contrib/beginner_example/bam_chrom_counter_input.json
I would also try to replace "file-XYZ" to "the/path/to/dir".
thanks. No I tried various things like "dx:..." but it doesn't work on my side.
Did it get compiled and run if you hardcode the dx path directly in WDL code? not specifying JSON...?
yes the code is compiled and launched. But it fails when the directory is used. For now I tried to put a string as the value of "dir_name" in the JSON file.I'll try without json tomorrow.
For now, I wonder if I shouldn't just use tools like `dx find ` (https://documentation.dnanexus.com/user/helpstrings-of-sdk-command-line-utilities ) with a plain string for the argument ` --path PROJECT:FOLDER`
Hmmm, yes, that is actually a good idea. I like it.
What if you do something like?:
task list_files_in_folder{
input{
String folder_of_interest
}
command<<<
project=$DX_PROJECT_CONTEXT_ID
folder=$project://~{folder_of_interest}
dx find data --path ${folder_of_interest} --norecurse --brief
>>>
output{
Array[String] files = read_lines(stdout())
}
}
Thanks, I'll try this tomorrow. But then, in `Array[String] files |` I'll get a list of file-id , say the IDs of VCF files. How will I give those ID to bcftools ?
```
bcftools view 'dx:PROJECT/<FILE-ID>' ?
```
OK, here is a working solution
Using the WDL keyword `directory ` was a bad choice. A better solution was to use a String containing the path to the directory and to call `dx find data` to get a list of files.
Here is a working solution:
```
workflow MY_WORKFLOW {
input {
String dirpath = "NO_DIR"
}
call FIND_VCF_FILES {
input:
dirpath = dirpath
}
(....)
task FIND_VCF_FILES {
input {
String dirpath
}
command <<<
hostname 1>&2
set -e
set -x
pwd 1>&2
dx find data --path "${DX_PROJECT_CONTEXT_ID}:~{dirpath}" --class file --brief --name "*.vcf.gz" > vcfs.list
>>>
output {
File vcfslist = "vcfs.list"
}
runtime {
cpu : 1
memory : "1MB"
}
}
```
Yes, I was just writing my reply... Similar code you are posting worked on my side.
Please sign in to leave a comment.