I would like to somatic mutation analysis with UKB Exome sequence data. My aim find to CHIP carriers in my cohort. I am planning to use Mutectcaller (Parabricks accelerated) app to call somatic mutations. But There is not the BWA Reference Genome Index (*.bwa-index.tar.gz) file which is the input file in my project's folders. Where can I find this file?
Just want to let you know that we have been in communication with Nvidia developer to ask protocol that they create the index file. We will share once we get the info.
0
Permanently deleted user
Thank you for the information. I'm looking forward to.
I am sharing some ideas how you can get the index file on UKB RAP. One option would be to download and unzip (untar) the downloaded folder via UKB RAP app. You could run the "wget" command e.g. via Swiss Army Knife or ttyd app.
The bwa index would create 5 files (.pac, .ann, .amb, .bwt and .sa.). The tarball that you download will have other files as well.
Feel free to let me know if you face any issues with getting data on the UKB RAP.
And I would like to say Thank you! to {@005t000000149vjAAA}? for helping with this Community post.
0
Permanently deleted user
Hello @Ondrej Klempir?
Thanks for index file. Can I download index file using link to my computer? and then can I upload to my project on RAP? I think this way seems like it will be easier for me.
Yes, I think it might work. However, I assume this (download to local and then upload to UKB RAP) would be much slower than working on RAP.
0
Permanently deleted user
Hi @Ondrej Klempir? ,
I uploaded all of files to my project after I untar . When I want to run the Mutectcaller, I couldn't select BWA Reference Genome Index file. None of files I uploaded to my project have extension .bwa-index.tar.gz. So, no results found on When I want to select a index file. How I can solve this problem?
Hi @Burcu Çevik?, my idea would be to create a tar.gz using just the 5 files mentioned above in this thread, i.e. (.pac, .ann, .amb, .bwt and .sa.). @Gary Burnett?, please, it my thinking correct?
The easiest way, and the way that I did it, was to download that file from AWS. and zip up the Ref/ folder. You can rename the tarball anything.bwa-index.tar.gz so that the software can pick it up and that should work. That's what I do when I need to run something. Then you can reuse that same reference tarball for anything with Parabricks.
0
Permanently deleted user
This time, when I wanted to run the Muteccaller, I could successfully select bwa index file. Thank you very much for useful helps.
Yeah I'm looking at the source code right now has some bugs in it when it comes to CRAM files. I would use BAM for now if you can while we sort it out.
0
Permanently deleted user
Hi @Gary Burnett?
I tried use BAM file which I converted CRAM file using the Swiss Army Knife, but I received "invalid sample name" error. I had read ?Sample name MUST match the SM tag in the tumor BAM file.? So I tried to extract SM tag from my BAM file using the Swiss Army Knife. But my output file is completely empty. So I haven't learned SM tag of my BAM file. Command that I used is:
I am also planning to use Mutectcaller (Parabricks accelerated) app to call somatic mutations and I'm having the same problem as you, I uploaded the reference file in .gz format and indexed it with bwa toncreate a tar.gz using just the 5 files. But I still can not to run the app successfully. I really want to know how you do it.
0
Permanently deleted user
What error message have you received? Could you share the screenshot?
This is the screenshot. I'm running Mutectcaller (Parabricks accelerated) app with the cram file and the tar.gz using just the 5 files including .pac, .ann, .amb, .bwt and .sa. The index files are generated from the fasta file using the bwa index hg38.fasta command, but it still reported an error. Have you encountered this and how should I fix it? Can the second screenshot thing be messed with once and the later not let it download automatically?I do not know what it is.Thank you !
0
Permanently deleted user
Before I never received these error messages. My suggestion is that you should using whole 10 files to create index file. Also, when you use CRAM file as inputs, this tool failures. You have to use BAM file for bam/cram file for tumor reads.
Comments
22 comments
Just want to let you know that we have been in communication with Nvidia developer to ask protocol that they create the index file. We will share once we get the info.
Thank you for the information. I'm looking forward to.
Hello {@005t000000BBrFkAAL}?,
Nvidia made the index file publicly available. From a command line interface, you can run:
wget -O parabricks_sample.tar.gz \
https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz
I am sharing some ideas how you can get the index file on UKB RAP. One option would be to download and unzip (untar) the downloaded folder via UKB RAP app. You could run the "wget" command e.g. via Swiss Army Knife or ttyd app.
The bwa index would create 5 files (.pac, .ann, .amb, .bwt and .sa.). The tarball that you download will have other files as well.
parabricks_sample/
parabricks_sample/Data/
parabricks_sample/Data/sample_2.fq.gz
parabricks_sample/Data/sample_1.fq.gz
parabricks_sample/Ref/
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.pac
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.ann
parabricks_sample/Ref/Hxomo_sapiens_assembly38.known_indels.vcf.gz.tbi
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.amb
parabricks_sample/Ref/Hxomo_sapiens_assembly38.dict
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.fai
parabricks_sample/Ref/Hxomo_sapiens_assembly38.known_indels.vcf.gz
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.bwt
parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.sa
Feel free to let me know if you face any issues with getting data on the UKB RAP.
And I would like to say Thank you! to {@005t000000149vjAAA}? for helping with this Community post.
Hello @Ondrej Klempir?
Thanks for index file. Can I download index file using link to my computer? and then can I upload to my project on RAP? I think this way seems like it will be easier for me.
Yes, I think it might work. However, I assume this (download to local and then upload to UKB RAP) would be much slower than working on RAP.
Hi @Ondrej Klempir? ,
I uploaded all of files to my project after I untar . When I want to run the Mutectcaller, I couldn't select BWA Reference Genome Index file. None of files I uploaded to my project have extension .bwa-index.tar.gz. So, no results found on When I want to select a index file. How I can solve this problem?
Hi @Burcu Çevik?, my idea would be to create a tar.gz using just the 5 files mentioned above in this thread, i.e. (.pac, .ann, .amb, .bwt and .sa.). @Gary Burnett?, please, it my thinking correct?
The easiest way, and the way that I did it, was to download that file from AWS. and zip up the Ref/ folder. You can rename the tarball anything.bwa-index.tar.gz so that the software can pick it up and that should work. That's what I do when I need to run something. Then you can reuse that same reference tarball for anything with Parabricks.
This time, when I wanted to run the Muteccaller, I could successfully select bwa index file. Thank you very much for useful helps.
Hi @Gary Burnett?,
I made a testing run using a UKB cram file, facing an error: Logic error: Got an uncompressed chunk but it was null., exiting."
The job log says, "input_options: --tumor-name test --in-tumor-bam in_tumor_bam.bam", maybe some additional parameter needs to be specified for cram?
Many thanks, Ondrej
Hey @Ondrej Klempir?
Yeah I'm looking at the source code right now has some bugs in it when it comes to CRAM files. I would use BAM for now if you can while we sort it out.
Hi @Gary Burnett?
I tried use BAM file which I converted CRAM file using the Swiss Army Knife, but I received "invalid sample name" error. I had read ?Sample name MUST match the SM tag in the tumor BAM file.? So I tried to extract SM tag from my BAM file using the Swiss Army Knife. But my output file is completely empty. So I haven't learned SM tag of my BAM file. Command that I used is:
./samtools view example.bam | cut -f12- > tags.txt
Could you give other suggestion, please?
Hey @Burcu Çevik? ,
I have had success with the command: samtools view -H output.bam
It prints out a lot of information, but after sifting through, I am usually able to find my read group.
Thank you Gary. I could find the SM tag of my BAM file.
I am also planning to use Mutectcaller (Parabricks accelerated) app to call somatic mutations and I'm having the same problem as you, I uploaded the reference file in .gz format and indexed it with bwa toncreate a tar.gz using just the 5 files. But I still can not to run the app successfully. I really want to know how you do it.
What error message have you received? Could you share the screenshot?
Before I never received these error messages. My suggestion is that you should using whole 10 files to create index file. Also, when you use CRAM file as inputs, this tool failures. You have to use BAM file for bam/cram file for tumor reads.
Hi! I want to download large files locally, how can I get the md5 value of the files on the RAP platform?
Hi, sorry. I don't know. I've not needed the md5 value when creating index file.
@Li Ping? please post this as a new question. We try to keep one thread for one topic.
I have found a way to view the md5 value of the file, thanks!
Please sign in to leave a comment.