Extract multiple regions from CRAM with samtools view on the RAP

Permanently deleted user

Hi everyone,

 

I've been struggling with this for quite a bit, and I have asked directly on the samtools github page (https://github.com/samtools/samtools/issues/1720), but we can't seem to find a way to solve it.

 

When using samtools view in streaming mode (i.e., with the paths starting with /mnt/project/) I need to use the -X option to explicitly point at the index file corresponding to each cram. This seems to work well unless I include the -L option to extract multiple regions from a bed file, regardless of whether I include -L before (i.e., samtools view -b -L "${bed}" -X "${cram}" "${crai}") or after (i.e., samtools view -b -X "${cram}" "${crai}" -L "${bed}") -X. I get the following error:

 

samtools view: Incorrect number of arguments for -X option. Aborting.

 

Neither me nor the samtools developers have been able to reproduce this error on our own machines, so they suspect this must somehow be related with the particularities of the environment.

 

As an alternative, they have suggested using the "##idx##" notation for explicitly pointing at the index, but it doesn't seem to be working. Running

 

samtools view -b -L "${bed}" "${cram}##idx##${crai}"

 

doesn't produce any error, but given the time it was taking to complete before I terminated it (compared with simply indicating the region on the call itself, without a bed file), it seems pretty clear that the index wasn't being loaded at all.

 

Has anyone faced a similar situation? Could the DNA Nexus team check this issue?

 

Cheers,

Fran

Comments

15 comments

  • Comment author
    Chai Fungtammasan DNAnexus Team

    Did you access the file by download them to worker or you use dxfuse? Also, are you using this on Swiss-army-knife app?

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Francisco Rodriguez-Algarra 

     

    1. What will happen if download cram and index to the worker and do not use /mnt/project/...?
    2. Which envinronment/samtools are you using? Are you running samtools via Swiss Army Knife? If so, can you try ttyd?
    3. What is the exact command you use (including folder path --> you can replace filename by X)?
    0
  • Comment author
    Permanently deleted user

    Hi both, thanks so much for your replies.

     

    I am indeed using dxfuse, since downloading the files made it impossible to work on every single WGS alignment. Not only each job became unbearably (and economically unfeasible) slow, but also I would need a single job per alignment, instead of the batches I have at the moment. I asked before about making dxfuse work with batches and I never managed to get a proper answer. The only way I managed to make these jobs work is by splitting into two scripts, one local, such as:

     

    export project=$( dx pwd )

    dx run swiss-army-knife \

     -iin="${project}scripts/twins_rap.sh" \

     -icmd="bash twins_rap.sh" \

     --instance-type "mem1_ssd1_v2_x16" \

     --destination "${project}data/twins/" \

     -y --brief

     

    and one remote, such as:

     

    while read f; do

     

     eid=$( basename "$f" .cram | cut -d_ -f1 )

     cram="/mnt/project/${f}"

     crai="${cram}.crai"

     samtools view -@ 14 -b -X "${cram}" "${crai}" <POSITION> > ${eid}.bam

     

    done < "/mnt/project/data/crams/crams_twins.txt"

     

    As you can see, I do use Swiss Army Knife. I don't know what ttyd is, but I'd be happy to try if you advised me.

     

    The commands that fail to work are the same ones as above but including a bed file, such as:

     

    [...]

    bed="/mnt/project/data/beds/an.bed"

     samtools view -@ 14 -L "${bed}" -b -X "${cram}" "${crai}" > ${eid}.bam

    [...]

     

    or

     

    [...]

    bed="/mnt/project/data/beds/an.bed"

     samtools view -@ 14 -L "${bed}" -b "${cram}##idx##${crai}" > ${eid}.bam

    [...]

     

    if I use the alternative notation that the samtools developers suggested.

     

    Hope this makes it clearer. Don't hesitate to ask if you need further details.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    I did a super quick test in ttyd and I am not observing the "samtools view: Incorrect number of arguments for -X option. Aborting." error message.

     

    a) ran ttyd from Tools library

     

    Screenshot 2022-10-04 at 19.11.55 

    b) opened a web terminal and installed samtools from apt lib

     

    Screenshot 2022-10-04 at 19.16.54 

    ?c) executed exactly the same as you provided

     

    samtools view -@ 14 -L "/mnt/project/Bulk/Exome sequences/Exome OQFE CRAM files/helper_files/xgen_plus_spikein.GRCh38.bed" -b -X "/mnt/project/Bulk/Exome sequences/Exome OQFE CRAM files/X/Y.cram" "/mnt/project/Bulk/Exome sequences/Exome OQFE CRAM files/X/Y.crai" > output.bam

     

    this is not giving me the error, therefore the issue might be related to the Swiss Army Knife environment

    0
  • Comment author
    Permanently deleted user

    Thanks for checking @Ondrej Klempir? , but what does it imply if it's a SAK issue?

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    In case this is an issue with SAK, it is likely just the recent version of SAK. It seems that e.g. SAK 4.5.0. is working without this error message.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    dx run swiss-army-knife/4.5.0

     

    see the full list of SAK versions here: https://ukbiobank.dnanexus.com/app/swiss-army-knife/versions

    0
  • Comment author
    Permanently deleted user

    Thanks so much, @Ondrej Klempir? . It indeed stops throwing the error using 4.5.0. Not sure what might be the main difference, though. I should probably check the samtools version loaded.

     

    In any case, it doesn't seem to be working as expected, at least in terms of running time. Similar to what I experienced with the ##idx## notation, I feel the index is not being loaded when -L is included. I am doing some more formal tests and I will report to the samtools developers. If I get any useful answer, I will report it back here for anyone who might experience the same issue in the future.

    0
  • Comment author
    Permanently deleted user

    As an aside, I'm trying to use ttyd for debugging, but I'm getting a 403 error when trying to open the webpage. Do you happen to know what could be causing it?

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    From my experience, I had to wait a couple of minutes and open the page after all prerequisites are installed and ttyd is ready to go.

    0
  • Comment author
    Permanently deleted user

    I don't think that is the issue, but I will try again.

    0
  • Comment author
    Permanently deleted user

    It did work when I launched it from the web instead of from my terminal! Thanks anyway :)

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Great. Also based on my experience, do not forget to terminate it at the end of your session :)

    0
  • Comment author
    Permanently deleted user

    Hi @Ondrej Klempir? . I have noticed that the jobs on which I have requested SAK 4.5.0 specifically have been marked as "on demand" instead of "spot" and being charged at three times the rate as previous ones. Is that to be expected?

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    Whether the instance run on-demand or on spot depend on two factors: level of priority that you set for your job and spot market of those instance at that time. There is no differentiation from using different version of the app itself.

    If you don' like to run job on-demand, you may use different job priority level here https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/managing-job-priority#priority-levels.

    0

Please sign in to leave a comment.