Where can I find the BWA Reference Genome Index file?

Permanently deleted user
I would like to somatic mutation analysis with UKB Exome sequence data. My aim find to CHIP carriers in my cohort. I am planning to use Mutectcaller (Parabricks accelerated) app to call somatic mutations. But There is not the BWA Reference Genome Index (*.bwa-index.tar.gz) file which is the input file in my project's folders. Where can I find this file?

Comments

22 comments

  • Comment author
    Chai Fungtammasan DNAnexus Team

    Just want to let you know that we have been in communication with Nvidia developer to ask protocol that they create the index file. We will share once we get the info.

    0
  • Comment author
    Permanently deleted user

    Thank you for the information. I'm looking forward to.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Hello {@005t000000BBrFkAAL}?,

     

    Nvidia made the index file publicly available. From a command line interface, you can run:

     

    wget -O parabricks_sample.tar.gz \

    https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz

     

    I am sharing some ideas how you can get the index file on UKB RAP. One option would be to download and unzip (untar) the downloaded folder via UKB RAP app. You could run the "wget" command e.g. via Swiss Army Knife or ttyd app.

     

    The bwa index would create 5 files (.pac, .ann, .amb, .bwt  and .sa.). The tarball that you download will have other files as well.

     

    parabricks_sample/

    parabricks_sample/Data/

    parabricks_sample/Data/sample_2.fq.gz

    parabricks_sample/Data/sample_1.fq.gz

    parabricks_sample/Ref/

    parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta

    parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.pac

    parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.ann

    parabricks_sample/Ref/Hxomo_sapiens_assembly38.known_indels.vcf.gz.tbi

    parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.amb

    parabricks_sample/Ref/Hxomo_sapiens_assembly38.dict

    parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.fai

    parabricks_sample/Ref/Hxomo_sapiens_assembly38.known_indels.vcf.gz

    parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.bwt

    parabricks_sample/Ref/Hxomo_sapiens_assembly38.fasta.sa

     

    Feel free to let me know if you face any issues with getting data on the UKB RAP.

     

    And I would like to say Thank you! to {@005t000000149vjAAA}? for helping with this Community post.

    0
  • Comment author
    Permanently deleted user

    Hello @Ondrej Klempir? 

     

    Thanks for index file. Can I download index file using link to my computer? and then can I upload to my project on RAP? I think this way seems like it will be easier for me.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Yes, I think it might work. However, I assume this (download to local and then upload to UKB RAP) would be much slower than working on RAP.

    0
  • Comment author
    Permanently deleted user

    Hi @Ondrej Klempir? ,

     

    I uploaded all of files to my project after I untar . When I want to run the Mutectcaller, I couldn't select BWA Reference Genome Index file. None of files I uploaded to my project have extension .bwa-index.tar.gz. So, no results found on When I want to select a index file. How I can solve this problem?

     

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Hi @Burcu Çevik?, my idea would be to create a tar.gz using just the 5 files mentioned above in this thread, i.e. (.pac, .ann, .amb, .bwt and .sa.). @Gary Burnett?, please, it my thinking correct?

    0
  • The easiest way, and the way that I did it, was to download that file from AWS. and zip up the Ref/ folder. You can rename the tarball anything.bwa-index.tar.gz so that the software can pick it up and that should work. That's what I do when I need to run something. Then you can reuse that same reference tarball for anything with Parabricks.

    0
  • Comment author
    Permanently deleted user

    This time, when I wanted to run the Muteccaller, I could successfully select bwa index file. Thank you very much for useful helps.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Hi @Gary Burnett?,

    I made a testing run using a UKB cram file, facing an error: Logic error: Got an uncompressed chunk but it was null., exiting."

     

    The job log says, "input_options: --tumor-name test --in-tumor-bam in_tumor_bam.bam", maybe some additional parameter needs to be specified for cram?

     

    Many thanks, Ondrej

    0
  • Hey @Ondrej Klempir? 

     

    Yeah I'm looking at the source code right now has some bugs in it when it comes to CRAM files. I would use BAM for now if you can while we sort it out.

    0
  • Comment author
    Permanently deleted user

    Hi @Gary Burnett? 

     

    I tried use BAM file which I converted CRAM file using the Swiss Army Knife, but I received "invalid sample name" error. I had read ?Sample name MUST match the SM tag in the tumor BAM file.? So I tried to extract SM tag from my BAM file using the Swiss Army Knife. But my output file is completely empty. So I haven't learned SM tag of my BAM file. Command that I used is:

     

    ./samtools view example.bam | cut -f12-  > tags.txt

     

    Could you give other suggestion, please?

    0
  • Hey @Burcu Çevik? ,

     

    I have had success with the command: samtools view -H output.bam

     

    It prints out a lot of information, but after sifting through, I am usually able to find my read group.

     

     

    0
  • Comment author
    Permanently deleted user

    Thank you Gary. I could find the SM tag of my BAM file.

    0
  • Comment author
    Former User of DNAx Community_35

    I am also planning to use Mutectcaller (Parabricks accelerated) app to call somatic mutations and I'm having the same problem as you, I uploaded the reference file in .gz format and indexed it with bwa toncreate a tar.gz using just the 5 files. But I still can not to run the app successfully. I really want to know how you do it.

    0
  • Comment author
    Permanently deleted user

    What error message have you received? Could you share the screenshot?

    0
  • Comment author
    Former User of DNAx Community_35

    image.pngimageThis is the screenshot. I'm running Mutectcaller (Parabricks accelerated) app  with the cram file and the tar.gz using just the 5 files including .pac, .ann, .amb, .bwt and .sa. The index files are generated from the fasta file using the bwa index hg38.fasta command, but it still reported an error. Have you encountered this and how should I fix it? Can the second screenshot thing be messed with once and the later not let it download automatically?I do not know what it is.Thank you !

    0
  • Comment author
    Permanently deleted user

    Before I never received these error messages. My suggestion is that you should using whole 10 files to create index file. Also, when you use CRAM file as inputs, this tool failures. You have to use BAM file for bam/cram file for tumor reads.

    0
  • Comment author
    Former User of DNAx Community_35

    Hi! I want to download large files locally, how can I get the md5 value of the files on the RAP platform?

    0
  • Comment author
    Permanently deleted user

    Hi, sorry. I don't know. I've not needed the md5 value when creating index file.

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    @Li Ping?  please post this as a new question. We try to keep one thread for one topic.

    0
  • Comment author
    Former User of DNAx Community_35

    I have found a way to view the md5 value of the file, thanks!

    0

Please sign in to leave a comment.