WDL: what is the correct way to specify a docker/container ? ( The entity container-1234 could not be found )
I'm trying to extract variants from a vcf file. My WDL task looks like this:
```
task SEARCH_VARIANTS {
input {
File vcf
File bed
}
command <<<
set -e
set -x
bcftools query --regions-file '~{bed}' -f '[%CHROM\t%POS\t%END\t%INFO/SVTYPE\t%INFO/SVLEN\t%FILTER\t%SAMPLE\t%GT\n]' "${F}" '~{vcfs}' > output.tsv
>>>
output {
File variants = "output.tsv"
}
runtime {
cpu : 1
memory : "1GB"
docker: "quay.io/biocontainers/bcftools:1.16--hfe4b78e_1"
}
}
```
But everytime I go into the following loop:
```
scatter (vcf in SPLIT_FILE.each_vcf) {
call SEARCH_VARIANTS {
input:
vcf=vcf,
bed=bed
}
}
````
I get the following error. I think it's related to Docker/container ?
> An input or bundled dependency could not be cloned into the project: ResourceNotFound: Error while cloning objects [file-GPXPz48JG7JFb0V0q10kYzXz]: The entity container-GPfyjJQJ2JpYxGYFb84Jvfg5 could not be found
Project:UKBB (project-1234) Executable:workflow FindSV (workflow-1234) Analysis:FINDSV (analysis-1234)
Error propagated from
Analysis:FINDSV (analysis-1234)Stage:scatter (vcf in SPLIT_FILE.each_vcf) (stage-3)Executable:applet FindSV_frag_stage-2 (applet-1234)Job:scatter (vcf in SPLIT_FILE.each_vcf) (job-1234)Error type:InputError
what does it mean ? How can I fix this ?
Comments
10 comments
For docker, I prefer saving it first to tar and use the following option:
https://github.com/dnanexus/dxCompiler/blob/develop/doc/ExpertOptions.md#storing-a-docker-image-as-a-file
In this specific case, I am wondering if the issue is the docker spec... Try "dx describe" the file file-GPXPz48JG7JFb0V0q10kYzXz. Beside of bcftools, can you run other commands on the files? such as head etc.?
{@005t0000006BZL2AAO}?
thanks, I create the docker image . Bcftools works on my local computer:
```
$ docker run f0f39c498d87 bcftools --version
bcftools 1.13
Using htslib 1.13+ds
````
I saved+ uploaded the docker image as a tar , and used the file-id with the following syntax.
```
runtime {
cpu : 1
memory : "1GB"
docker: "dx://file-12345"
}
```
but I still get the following error:
````
An input or bundled dependency could not be cloned into the project: ResourceNotFound: Error while cloning objects [file-GPXPz48JG7JFb0V0q10kYzXz]: The entity container-GPfyjJQJ2JpYxGYFb84Jvfg5 could not be found
```
there is no log for the task itself ( "Analyses have not started, no logs present.")
OK, so I think that the issue is related to the file-GPXPz48JG7JFb0V0q10kYzXz itself. It seems to me that the file (is this file included in the exported vfcs list?) cannot be downloaded/accessed.
@Ondrej Klempir? so is it an error on my side ? How can I debug this ?
FYI I put my workflow in a gist, with the json params masked. https://gist.github.com/lindenb/be95478c8fdc8ca7c74339d413a484be
I changed my params.json this is the diff
```
"stage-common.bed":{
"$dnanexus_link":{
"project":"project-1234",
- "_path":"Pierre/456.bed",
"id":"file-7890"
}
},
```
and now the error has changed !!(?!) why a extra key in a json object would affect anything ?
```
job script function run_command exited with permanent fail code 255 aaaaaaaaa.list + read F + bcftools query --regions-file home dnanexus inputs input8250745273061749280 20230209.bed -f [%CHROM t%POS t%END t%INFO SVTYPE t%INFO SVLEN t%FILTER t%SAMPLE t%GT n] project-GP3pv5jJG7J84QX723JbjPF9:file-G2bzJZ8JkF6PX01z7gxkFk2F [E::hts_open_format] Failed to open file project-GP3pv5jJG7J84QX723JbjPF9:file-G2bzJZ8JkF6PX01z7gxkFk2F : No such file or directory Failed to read from project-GP3pv5jJG7J84QX723JbjPF9:file-G2bzJZ8JkF6PX01z7gxkFk2F: No such file or directory
````
no it looks like input.bed file is not correctly used (?)
```
--regions-file home dnanexus inputs input8250745273061749280 20230209.bed
```
and a list of full paths saved in a file cannot be used (?)
Instead of ...$dnanexus_link"...:
A) what if you specify it like here: https://github.com/dnanexus/dxCompiler/blob/develop/contrib/beginner_example/bam_chrom_counter_input.json ?
https://github.com/dnanexus/dxCompiler/blob/develop/contrib/beginner_example/bam_chrom_counter.wdl this is the corresponding wdl code to the json above... could you review the syntax in <<<CODE SECTION>>> and compare?
B) What if you hardcode the input instead of providing it in json?
C) Alternatively, what happens if you run the compiled workflow via GUI and specify bed input via graphical input field?
Maybe at this stage of your development, it's probably best to send request to ukbiobank-support@dnanexus.com since the support team could inspect your project and log files.
changing the json object to a plain string produces the following error ( dx version is dx v0.338.1 )
```
dxpy.exceptions.InvalidInput: i/o value bed needs to be given using DNAnexus links, code 422. Request Time=1676558620.165655, Request ID=1676556929825-975130
Details: {
"field": "bed",
"reason": "malformedLink",
"expected": "not a mapping"
}
```
I switched back the version from 1.1 to 1.0, same error: dxpy.exceptions.InvalidInput: i/o value bed needs to be given using DNAnexus links.
Good to know, thanks. So if it is a bed file, could you run your bcftools without it? as a testing run?
The following code worked:
```
runtime {
cpu : 1
memory : "1GB"
docker: "quay.io/biocontainers/bcftools:1.16--hfe4b78e_1"
}
```
(Thanks to the B Slavik / UKBB support)
Please sign in to leave a comment.