Using Batch Files in Swiss Army Knife for Mutect2 Pipeline
Hello,
I'm using Mutect2 to call somatic mutations from the WES data. To do this, I have to convert the CRAM files to BAM.
#Code to generate batch files
dx generate_batch_inputs --path "project-Gfxv5x0JqZG7QB7qP4bz923g:/Bulk/Exome sequences/Exome OQFE CRAM files/10/" -iin="(.*)_23143_0_0.cram$"
This creates 19 batch files ranging from dx_batch.0000.tsv - dx_batch.0018.tsv
#Code to alter structure, puts project and file IDs into brackets
head -n 1 dx_batch.0000.tsv > temp.tsv && tail -n +2 dx_batch.0000.tsv | awk '{sub($3, "[" $3 "]"); sub($4, "[" $4 "]"); print}' >> temp.tsv; tr -d '\r' < temp.tsv > new.tsv; rm temp.tsv
#Code to run job using Swiss Army Knife samtools
# Path to your TSV file containing CRAM file IDs and reference information
Batch_Tsv="temp.tsv"
# Destination folder for output BAM files
Output_Folder="project-ID:/Username/<Your_Folder>"
# Run Swiss Army Knife app using the batch TSV
dx run swiss-army-knife \
--batch-tsv "$Batch_Tsv" \
-icmd='samtools view -T "/mnt/project/Bulk/Exome sequences/Exome OQFE CRAM files/helper_files/GRCh38_full_analysis_set_plus_decoy_hla.fa" -L "/mnt/project/Nicole/DNMT3A_locus.bed" -b "$in_name" > "$in_prefix".bam' \
--destination="$Output_Folder" \
--priority normal \
--instance-type mem1_ssd1_v2_x2 \
--detach --yes
This will run 1 job per 1 file within that batch tsv file. Does anyone know how I can alter this pipeline to run 1 job per say, 50 files in the batch tsv? Or has any recommendations on alternatives ways to run this pipeline?
Comments
1 comment
Hello Nicole,
mutect2 should accept crams as input, see: https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2
I don't believe that samtools view can handle multiple crams at once.
Also by running 1 job per file, it should be possible to choose a smaller and often less costly instance.
Thank you for getting in touch, I hope this has been helpful.
Please sign in to leave a comment.