WDL: what is the correct way to specify a docker/container ? ( The entity container-1234 could not be found )

I'm trying to extract variants from a vcf file. My WDL task looks like this:

 

```

task SEARCH_VARIANTS {

   input {

       File vcf

       File bed

       }

   command <<<

   set -e

   set -x

       bcftools query --regions-file '~{bed}' -f '[%CHROM\t%POS\t%END\t%INFO/SVTYPE\t%INFO/SVLEN\t%FILTER\t%SAMPLE\t%GT\n]' "${F}" '~{vcfs}' > output.tsv

 

   >>>

   output {

               File variants = "output.tsv"

               }

 

   runtime {

               cpu : 1

               memory : "1GB"

      docker: "quay.io/biocontainers/bcftools:1.16--hfe4b78e_1"

       }

   }

```

 

But everytime I go into the following loop:

 

```

 scatter (vcf in SPLIT_FILE.each_vcf) {

           call SEARCH_VARIANTS {

               input:

                   vcf=vcf,

                   bed=bed

               }

           }

````

 

I get the following error. I think it's related to Docker/container ?

 

 

> An input or bundled dependency could not be cloned into the project: ResourceNotFound: Error while cloning objects [file-GPXPz48JG7JFb0V0q10kYzXz]: The entity container-GPfyjJQJ2JpYxGYFb84Jvfg5 could not be found

Project:UKBB (project-1234) Executable:workflow FindSV (workflow-1234) Analysis:FINDSV (analysis-1234)

Error propagated from

Analysis:FINDSV (analysis-1234)Stage:scatter (vcf in SPLIT_FILE.each_vcf) (stage-3)Executable:applet FindSV_frag_stage-2 (applet-1234)Job:scatter (vcf in SPLIT_FILE.each_vcf) (job-1234)Error type:InputError

 

what does it mean ? How can I fix this ?

Comments

10 comments

  • Comment author
    Ondrej Klempir DNAnexus Team

    For docker, I prefer saving it first to tar and use the following option:

    https://github.com/dnanexus/dxCompiler/blob/develop/doc/ExpertOptions.md#storing-a-docker-image-as-a-file

     

    In this specific case, I am wondering if the issue is the docker spec... Try "dx describe" the file file-GPXPz48JG7JFb0V0q10kYzXz. Beside of bcftools, can you run other commands on the files? such as head etc.?

    0
  • Comment author
    Former User of DNAx Community_46

    {@005t0000006BZL2AAO}? 

     

    thanks, I create the docker image . Bcftools works on my local computer:

     

    ```

    $ docker run f0f39c498d87 bcftools --version

    bcftools 1.13

    Using htslib 1.13+ds

    ````

     

    I saved+ uploaded the docker image as a tar , and used the file-id with the following syntax.

     

    ```

       runtime {

                   cpu : 1

                   memory : "1GB"                             

                   docker: "dx://file-12345"

           }

    ```

     

    but I still get the following error:

     

    ````

    An input or bundled dependency could not be cloned into the project: ResourceNotFound: Error while cloning objects [file-GPXPz48JG7JFb0V0q10kYzXz]: The entity container-GPfyjJQJ2JpYxGYFb84Jvfg5 could not be found

    ```

     

    there is no log for the task itself ( "Analyses have not started, no logs present.")

     

     

     

     

     

     

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    OK, so I think that the issue is related to the file-GPXPz48JG7JFb0V0q10kYzXz itself. It seems to me that the file (is this file included in the exported vfcs list?) cannot be downloaded/accessed.

    0
  • Comment author
    Former User of DNAx Community_46

    @Ondrej Klempir? so is it an error on my side ? How can I debug this ?

     

    FYI I put my workflow in a gist, with the json params masked. https://gist.github.com/lindenb/be95478c8fdc8ca7c74339d413a484be

     

    0
  • Comment author
    Former User of DNAx Community_46

    I changed my params.json this is the diff

    ```

     "stage-common.bed":{

           "$dnanexus_link":{

                   "project":"project-1234",

    -              "_path":"Pierre/456.bed",

                   "id":"file-7890"

                   }

           },

    ```

     

    and now the error has changed !!(?!) why a extra key in a json object would affect anything ?

     

    ```

    job script function run_command exited with permanent fail code 255 aaaaaaaaa.list + read F + bcftools query --regions-file home dnanexus inputs input8250745273061749280 20230209.bed -f [%CHROM t%POS t%END t%INFO SVTYPE t%INFO SVLEN t%FILTER t%SAMPLE t%GT n] project-GP3pv5jJG7J84QX723JbjPF9:file-G2bzJZ8JkF6PX01z7gxkFk2F [E::hts_open_format] Failed to open file project-GP3pv5jJG7J84QX723JbjPF9:file-G2bzJZ8JkF6PX01z7gxkFk2F : No such file or directory Failed to read from project-GP3pv5jJG7J84QX723JbjPF9:file-G2bzJZ8JkF6PX01z7gxkFk2F: No such file or directory

    ````

     

    no it looks like input.bed file is not correctly used (?)

     

    ```

    --regions-file home dnanexus inputs input8250745273061749280 20230209.bed

    ```

     

    and a list of full paths saved in a file cannot be used (?)

     

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Instead of ...$dnanexus_link"...:

     

    A) what if you specify it like here: https://github.com/dnanexus/dxCompiler/blob/develop/contrib/beginner_example/bam_chrom_counter_input.json ?

     

    https://github.com/dnanexus/dxCompiler/blob/develop/contrib/beginner_example/bam_chrom_counter.wdl this is the corresponding wdl code to the json above... could you review the syntax in <<<CODE SECTION>>> and compare?

     

    B) What if you hardcode the input instead of providing it in json?

     

    C) Alternatively, what happens if you run the compiled workflow via GUI and specify bed input via graphical input field?

     

    Maybe at this stage of your development, it's probably best to send request to ukbiobank-support@dnanexus.com since the support team could inspect your project and log files.

    0
  • Comment author
    Former User of DNAx Community_46

    changing the json object to a plain string produces the following error ( dx version is dx v0.338.1 )

     

    ```

    dxpy.exceptions.InvalidInput: i/o value bed needs to be given using DNAnexus links, code 422. Request Time=1676558620.165655, Request ID=1676556929825-975130

    Details: {

       "field": "bed",

       "reason": "malformedLink",

       "expected": "not a mapping"

    }

    ```

    0
  • Comment author
    Former User of DNAx Community_46

    I switched back the version from 1.1 to 1.0, same error: dxpy.exceptions.InvalidInput: i/o value bed needs to be given using DNAnexus links.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Good to know, thanks. So if it is a bed file, could you run your bcftools without it? as a testing run?

    0
  • Comment author
    Former User of DNAx Community_46

    The following code worked:

     

    ```

       runtime {

                   cpu : 1

                   memory : "1GB"

          docker: "quay.io/biocontainers/bcftools:1.16--hfe4b78e_1"

           }

    ```

     

    (Thanks to the B Slavik / UKBB support)

    0

Please sign in to leave a comment.