Extract multiple regions from CRAM with samtools view on the RAP
Hi everyone,
I've been struggling with this for quite a bit, and I have asked directly on the samtools github page (https://github.com/samtools/samtools/issues/1720), but we can't seem to find a way to solve it.
When using samtools view in streaming mode (i.e., with the paths starting with /mnt/project/) I need to use the -X option to explicitly point at the index file corresponding to each cram. This seems to work well unless I include the -L option to extract multiple regions from a bed file, regardless of whether I include -L before (i.e., samtools view -b -L "${bed}" -X "${cram}" "${crai}") or after (i.e., samtools view -b -X "${cram}" "${crai}" -L "${bed}") -X. I get the following error:
samtools view: Incorrect number of arguments for -X option. Aborting.
Neither me nor the samtools developers have been able to reproduce this error on our own machines, so they suspect this must somehow be related with the particularities of the environment.
As an alternative, they have suggested using the "##idx##" notation for explicitly pointing at the index, but it doesn't seem to be working. Running
samtools view -b -L "${bed}" "${cram}##idx##${crai}"
doesn't produce any error, but given the time it was taking to complete before I terminated it (compared with simply indicating the region on the call itself, without a bed file), it seems pretty clear that the index wasn't being loaded at all.
Has anyone faced a similar situation? Could the DNA Nexus team check this issue?
Cheers,
Fran
Comments
15 comments
Did you access the file by download them to worker or you use dxfuse? Also, are you using this on Swiss-army-knife app?
Francisco Rodriguez-Algarra
Hi both, thanks so much for your replies.
I am indeed using dxfuse, since downloading the files made it impossible to work on every single WGS alignment. Not only each job became unbearably (and economically unfeasible) slow, but also I would need a single job per alignment, instead of the batches I have at the moment. I asked before about making dxfuse work with batches and I never managed to get a proper answer. The only way I managed to make these jobs work is by splitting into two scripts, one local, such as:
export project=$( dx pwd )
dx run swiss-army-knife \
-iin="${project}scripts/twins_rap.sh" \
-icmd="bash twins_rap.sh" \
--instance-type "mem1_ssd1_v2_x16" \
--destination "${project}data/twins/" \
-y --brief
and one remote, such as:
while read f; do
eid=$( basename "$f" .cram | cut -d_ -f1 )
cram="/mnt/project/${f}"
crai="${cram}.crai"
samtools view -@ 14 -b -X "${cram}" "${crai}" <POSITION> > ${eid}.bam
done < "/mnt/project/data/crams/crams_twins.txt"
As you can see, I do use Swiss Army Knife. I don't know what ttyd is, but I'd be happy to try if you advised me.
The commands that fail to work are the same ones as above but including a bed file, such as:
[...]
bed="/mnt/project/data/beds/an.bed"
samtools view -@ 14 -L "${bed}" -b -X "${cram}" "${crai}" > ${eid}.bam
[...]
or
[...]
bed="/mnt/project/data/beds/an.bed"
samtools view -@ 14 -L "${bed}" -b "${cram}##idx##${crai}" > ${eid}.bam
[...]
if I use the alternative notation that the samtools developers suggested.
Hope this makes it clearer. Don't hesitate to ask if you need further details.
I did a super quick test in ttyd and I am not observing the "samtools view: Incorrect number of arguments for -X option. Aborting." error message.
a) ran ttyd from Tools library
b) opened a web terminal and installed samtools from apt lib
?c) executed exactly the same as you provided
samtools view -@ 14 -L "/mnt/project/Bulk/Exome sequences/Exome OQFE CRAM files/helper_files/xgen_plus_spikein.GRCh38.bed" -b -X "/mnt/project/Bulk/Exome sequences/Exome OQFE CRAM files/X/Y.cram" "/mnt/project/Bulk/Exome sequences/Exome OQFE CRAM files/X/Y.crai" > output.bam
this is not giving me the error, therefore the issue might be related to the Swiss Army Knife environment
Thanks for checking @Ondrej Klempir? , but what does it imply if it's a SAK issue?
In case this is an issue with SAK, it is likely just the recent version of SAK. It seems that e.g. SAK 4.5.0. is working without this error message.
dx run swiss-army-knife/4.5.0
see the full list of SAK versions here: https://ukbiobank.dnanexus.com/app/swiss-army-knife/versions
Thanks so much, @Ondrej Klempir? . It indeed stops throwing the error using 4.5.0. Not sure what might be the main difference, though. I should probably check the samtools version loaded.
In any case, it doesn't seem to be working as expected, at least in terms of running time. Similar to what I experienced with the ##idx## notation, I feel the index is not being loaded when -L is included. I am doing some more formal tests and I will report to the samtools developers. If I get any useful answer, I will report it back here for anyone who might experience the same issue in the future.
As an aside, I'm trying to use ttyd for debugging, but I'm getting a 403 error when trying to open the webpage. Do you happen to know what could be causing it?
From my experience, I had to wait a couple of minutes and open the page after all prerequisites are installed and ttyd is ready to go.
I don't think that is the issue, but I will try again.
It did work when I launched it from the web instead of from my terminal! Thanks anyway :)
Great. Also based on my experience, do not forget to terminate it at the end of your session :)
Hi @Ondrej Klempir? . I have noticed that the jobs on which I have requested SAK 4.5.0 specifically have been marked as "on demand" instead of "spot" and being charged at three times the rate as previous ones. Is that to be expected?
Whether the instance run on-demand or on spot depend on two factors: level of priority that you set for your job and spot market of those instance at that time. There is no differentiation from using different version of the app itself.
If you don' like to run job on-demand, you may use different job priority level here https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/managing-job-priority#priority-levels.
Please sign in to leave a comment.