Range request to only download a BAM genomic interval
Hi all,
using WDL I would like to fetch a section (chrom:start-end) of a bam.
I know I can do this using
```
samtools view "the.bam" "chrom:start-end"
```
nevertheless (am I wrong ?), before using the BAM, RAP will fetch the **whole** BAM in my session.
A much faster way would be to use `Range-Request` ( https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests ) on the remote bam.
First, samtools can work with URLs if the remote bam is indexed and the server supports Range-request:
samtools view "http://www.abc.org/in.bam" "chrom:start-end"
Second, it looks that AWS supports Range-Requests : https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html
So my question is: is there a way to fetch a section of a BAM in my session without fetching the whole document ?
Thanks !
Comments
1 comment
Hi @Pierre LIndenbaum?,
I would first try dxfuse-mnt. Samtools should be compatible with mnt sequential read. Please see
https://community.dnanexus.com/s/question/0D5t0000045Gx4GCAS/extract-multiple-regions-from-cram-with-samtools-view-on-the-rap
So I believe there is no need to download whole file on RAP and rather read it via dxfuse.
Please sign in to leave a comment.