How to use --batch-tsv with swiss army knife (bcftools in particular)
Hi,
I am trying to run swiss army knife bcftools in the batch mode. I am able to successfully run it in a non-batch mode using this command:
dx run app-J2fv5P89f2ZFbj52533QZKPG -iin="PCMT1_AgingDisease:/Bulk/DRAGEN\ WGS/Whole\ genome\ SV\ call\ files\ \(DRAGEN\)\ [500k\ release]/12/XXXXXXX_24059_0_0.dragen.s
v.vcf.gz" -iin="PCMT1_AgingDisease:/Bulk/DRAGEN\ WGS/Whole\ genome\ SV\ call\ files\ \(DRAGEN\)\ [500k\ release]/12/XXXXXXX_24059_0_0.dragen.sv.vcf.gz.tbi" -icmd="bcfto
ols view -r chr6:149749695-149811421 -o PCMT1_SV_b.vcf XXXXXXX_24059_0_0.dragen.sv.vcf.gz" -imount_inputs="true" --priority normal --instance-type '{"main" : "mem1_hdd1
_v2_x4"}' -y --brief
However, I am not getting successful in running it in batch mode through ‘dx generate_batch_inputs’ despite trying in many ways.
I used this command to generate batch file: dx generate_batch_inputs --path "PCMT1_AgingDisease:/SV_VCFs" -ivcf="^(.*)_0_0\.dragen\.sv\.vcf\.gz$" -itbi="^(.*)_0_0\.dragen\.sv\.vcf\.gz.tbi$"
and my batch file looks like this (just copying header):
batch ID tbi vcf tbi ID vcf ID
(and I have processed the file further to make sure that the file IDs are in square bracket as mentioned in the documentation page here: https://documentation.dnanexus.com/user/running-apps-and-workflows/running-batch-jobs)
However, using this batch file while running the app through following command:
dx run app-J2fv5P89f2ZFbj52533QZKPG --batch-tsv new.tsv -icmd="bcftools view -r chr6:149749695-149811421 -o PCMTsv.vcf *" -imount_inputs="true" --priority normal --instance-type '{"main" : "mem1_hdd1_v2_x4"}' --destination "/results/" --detach --tag "count" -y --brief --batch-folders
gave me this error:
Exception: Mismatch in number of launch_args vs. batch_ids (0 != 5)
I understand that the header of batch tsv file should only have the values in and in ID for last four columns, so I tried that way with my batch tsv file header looking like this (with one in and in ID columns for vcf.gz and other in and in ID columns for vcf.gz.tbi):
batch ID in in in ID in ID
(BTW, I made this file using this command dx generate_batch_inputs --path "PCMT1_AgingDisease:/SV_VCFs" -iin="^(.*)_0_0\.dragen\.sv\.vcf\.gz$" -iinb="^(.*)_0_0\.dragen\.sv\.vcf\.gz.tbi$" and then manually edit the header replacing inb with in, as generate_batch_inputs command only accepts one -in)
I was though able to submit my jobs with this batch tsv file but the jobs failed with error: Failed to read from XXXXXXX_24059_0_0.dragen.sv.vcf.gz: could not load index
I got this error, although both vcf.gz and vcf.gz.tbi file IDs are present in the batch tsv file with the header (in). When running swiss army knife in the non-batch mode (as shown in the command line above) it worked by using -iin for both vcf.gz and vcf.gz.tbi file. However the batch run is giving issue.
I spent much time solving this issue but failed. If someone would please provide some help and technical support regarding this.
Thank you
Comments
0 comments
Please sign in to leave a comment.