Where can I find the BWA Reference Genome Index file?

Permanently deleted user

02 March 2023 00:00
22 comments

I would like to somatic mutation analysis with UKB Exome sequence data. My aim find to CHIP carriers in my cohort. I am planning to use Mutectcaller (Parabricks accelerated) app to call somatic mutations. But There is not the BWA Reference Genome Index (*.bwa-index.tar.gz) file which is the input file in my project's folders. Where can I find this file?

Comments

22 comments

Chai Fungtammasan DNAnexus Team
- 09 March 2023 15:28
Just want to let you know that we have been in communication with Nvidia developer to ask protocol that they create the index file. We will share once we get the info.

0
Permanently deleted user
- 10 March 2023 08:19
Thank you for the information. I'm looking forward to.

0
Ondrej Klempir DNAnexus Team
- 15 March 2023 20:34
Hello {@005t000000BBrFkAAL}?,

Nvidia made the index file publicly available. From a command line interface, you can run:

wget -O parabricks_sample.tar.gz \
https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz

I am sharing some ideas how you can get the index file on UKB RAP. One option would be to download and unzip (untar) the downloaded folder via UKB RAP app. You could run the "wget" command e.g. via Swiss Army Knife or ttyd app.

The bwa index would create 5 files (.pac, .ann, .amb, .bwt and .sa.). The tarball that you download will have other files as well.

parabricks_sample/
parabricks_sample/Data/
parabricks_sample/Data/sample_2.fq.gz
parabricks_sample/Data/sample_1.fq.gz
parabricks_sample/Ref/
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.pac
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.ann
parabricks_sample/Ref/Hxomo_sapiens_assembly38.known_indels.vcf.gz.tbi
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.amb
parabricks_sample/Ref/Hxomo_sapiens_assembly38.dict
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.fai
parabricks_sample/Ref/Hxomo_sapiens_assembly38.known_indels.vcf.gz
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.bwt
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.sa

Feel free to let me know if you face any issues with getting data on the UKB RAP.

And I would like to say Thank you! to {@005t000000149vjAAA}? for helping with this Community post.

0
Permanently deleted user
- 16 March 2023 15:01
Hello @Ondrej Klempir?

Thanks for index file. Can I download index file using link to my computer? and then can I upload to my project on RAP? I think this way seems like it will be easier for me.

0
Ondrej Klempir DNAnexus Team
- 16 March 2023 16:02
Yes, I think it might work. However, I assume this (download to local and then upload to UKB RAP) would be much slower than working on RAP.

0
Permanently deleted user
- 18 March 2023 10:08
Hi @Ondrej Klempir? ,

I uploaded all of files to my project after I untar . When I want to run the Mutectcaller, I couldn't select BWA Reference Genome Index file. None of files I uploaded to my project have extension .bwa-index.tar.gz. So, no results found on When I want to select a index file. How I can solve this problem?

0
Ondrej Klempir DNAnexus Team
- 20 March 2023 07:25
Hi @Burcu Çevik?, my idea would be to create a tar.gz using just the 5 files mentioned above in this thread, i.e. (.pac, .ann, .amb, .bwt and .sa.). @Gary Burnett?, please, it my thinking correct?

0
Former User of DNAx Community_34
- 20 March 2023 21:33
The easiest way, and the way that I did it, was to download that file from AWS. and zip up the Ref/ folder. You can rename the tarball anything.bwa-index.tar.gz so that the software can pick it up and that should work. That's what I do when I need to run something. Then you can reuse that same reference tarball for anything with Parabricks.

0
Permanently deleted user
- 21 March 2023 11:08
This time, when I wanted to run the Muteccaller, I could successfully select bwa index file. Thank you very much for useful helps.

0
Ondrej Klempir DNAnexus Team
- 04 April 2023 15:12
Hi @Gary Burnett?,
I made a testing run using a UKB cram file, facing an error: Logic error: Got an uncompressed chunk but it was null., exiting."

The job log says, "input_options: --tumor-name test --in-tumor-bam in_tumor_bam.bam", maybe some additional parameter needs to be specified for cram?

Many thanks, Ondrej

0
Former User of DNAx Community_34
- 07 April 2023 22:32
Hey @Ondrej Klempir?

Yeah I'm looking at the source code right now has some bugs in it when it comes to CRAM files. I would use BAM for now if you can while we sort it out.

0
Permanently deleted user
- 27 April 2023 09:36
Hi @Gary Burnett?

I tried use BAM file which I converted CRAM file using the Swiss Army Knife, but I received "invalid sample name" error. I had read ?Sample name MUST match the SM tag in the tumor BAM file.? So I tried to extract SM tag from my BAM file using the Swiss Army Knife. But my output file is completely empty. So I haven't learned SM tag of my BAM file. Command that I used is:

./samtools view example.bam | cut -f12- > tags.txt

Could you give other suggestion, please?

0
Former User of DNAx Community_34
- 28 April 2023 20:21
Hey @Burcu Çevik? ,

I have had success with the command: samtools view -H output.bam

It prints out a lot of information, but after sifting through, I am usually able to find my read group.

0
Permanently deleted user
- 30 April 2023 10:59
Thank you Gary. I could find the SM tag of my BAM file.

0
Former User of DNAx Community_35
- 12 September 2023 08:21
I am also planning to use Mutectcaller (Parabricks accelerated) app to call somatic mutations and I'm having the same problem as you, I uploaded the reference file in .gz format and indexed it with bwa toncreate a tar.gz using just the 5 files. But I still can not to run the app successfully. I really want to know how you do it.

0
Permanently deleted user
- 12 September 2023 09:36
What error message have you received? Could you share the screenshot?

0
Former User of DNAx Community_35
- 12 September 2023 14:12
This is the screenshot. I'm running Mutectcaller (Parabricks accelerated) app with the cram file and the tar.gz using just the 5 files including .pac, .ann, .amb, .bwt and .sa. The index files are generated from the fasta file using the bwa index hg38.fasta command, but it still reported an error. Have you encountered this and how should I fix it? Can the second screenshot thing be messed with once and the later not let it download automatically?I do not know what it is.Thank you !

0
Permanently deleted user
- 12 September 2023 15:33
Before I never received these error messages. My suggestion is that you should using whole 10 files to create index file. Also, when you use CRAM file as inputs, this tool failures. You have to use BAM file for bam/cram file for tumor reads.

0
Former User of DNAx Community_35
- 13 September 2023 13:19
Hi! I want to download large files locally, how can I get the md5 value of the files on the RAP platform?

0
Permanently deleted user
- 13 September 2023 14:18
Hi, sorry. I don't know. I've not needed the md5 value when creating index file.

0
Chai Fungtammasan DNAnexus Team
- 13 September 2023 16:17
@Li Ping? please post this as a new question. We try to keep one thread for one topic.

0
Former User of DNAx Community_35
- 14 September 2023 02:07
I have found a way to view the md5 value of the file, thanks!

0

Please sign in to leave a comment.