Trouble using $in_prefix and $in_name with swiss-army-knife
Hi! I'm hoping to set up a batch job to extract filters from the vcf files stored at
Bulk/Exome sequences_Alternative exome processing/Exome variant call files (gnomAD) (VCFs)/. I've been testing out the process I wish to run with the following command over just one file:
CMD_STR="bcftools view -G -Ou "\${in_name[0]}" |\
bcftools norm -m - -f GRCh38_reference_genome.fa -Ou |\
bcftools annotate --threads 4 -x INFO,QUAL,ID -Ou |\
bcftools view -H -Oz > "\${in_prefix[0]}".vcf.gz"
dx run swiss-army-knife -iin="$VARIANT_PATH/ukb24068_c11_b43_v1.vcf.gz"\
-iin=<path to FASTA reference file> -iin=<path to FASTA reference index file>\
-icmd="$CMD_STR" --instance-type "mem1_ssd1_v2_x2"
where $VARIANT_PATH provides the path to the directory containing this file.
However, whenever I include any variation on $in_name or $in_prefix in place of the file name for the first bcftools view command, the job fails with the same error:
"Failed to read from ukb24068_c11_b43_v1.vcf.gz: unknown file type"
I've tried this with a number of alternatives for $in_name, and did note that I needed to escape the $ when first specifying my command string in order for this to work at all. Alternatives I've tried include: "\${in_name[0]}", "\${in_name}", "\$in_name", \${in_name}, \$in_name, and "$in_prefix".vcf.gz. Each time, I get this same error. What's interesting is that in all cases, the special keyword is correctly replaced with the proper file name. However, for some reason, this replacement doesn't seem to work the same as just specifying ukb24068_c11_b43_v1.vcf.gz directly. (I've also tried this with other files to make sure the file itself isn't the problem, and I get the same error. I've worked with other files in this directory in other capacities and without trying to use the special $in_name or $in_prefix keywords, and I haven't run into similar issues with bcftools in those instances.) The vcf file appears to be properly loaded as an input from the -iin parameter, so I'm not sure what the issue is.
Have others been able to successfully use $in_prefix and $in_name keywords with swiss army knife from the command line? This seems like a necessary step if I intend to run a batch job with these files, as I won't be able to explicitly specify the file name in each command string of the batch.
Here's the more complete stderr message:
Downloading files using 2 threads+ [[ '' == '' ]]
+ eval 'bcftools view -G -Ou ${in_name[0]} | bcftools norm -m - -f GRCh38_reference_genome.fa -Ou | bcftools annotate --threads 4 -x INFO,QUAL,ID -Ou | bcftools view -H -Oz > ${in_prefix[0]}.vcf.gz'
++ bcftools view -G -Ou ukb24068_c11_b43_v1.vcf.gz
++ bcftools norm -m - -f GRCh38_reference_genome.fa -Ou
++ bcftools view -H -Oz
++ bcftools annotate --threads 4 -x INFO,QUAL,ID -Ou
Failed to read from ukb24068_c11_b43_v1.vcf.gz: unknown file type
Failed to read from standard input: unknown file type
Failed to read from standard input: unknown file type
Failed to read from standard input: unknown file type
Thanks!
Comments
2 comments
To one of your questions, "Have others been able to successfully use $in_prefix and $in_name keywords with swiss army knife from the command line?", I was able to test $in_prefix in SAK successfully on two VCF files:
A) /Bulk/Exome sequences_Alternative exome processing/Exome variant call files (gnomAD) (VCFs)/
Then a minimal testing bcftools command then failed with the same error as you observed
eval 'bcftools view -x -s * > "$in_prefix".post.vcf.gz';bcftools view -x -s * > "$in_prefix".post.vcf.gz
+ eval 'eval '\''bcftools view -x -s * > "$in_prefix".post.vcf.gz'\'';bcftools view -x -s * > "$in_prefix".post.vcf.gz'
++ eval 'bcftools view -x -s * > "$in_prefix".post.vcf.gz'
+++ bcftools view -x -s ukb24068_cXT_b4Z_v1.vcf.gz
Failed to read from standard input: unknown file type
Could you please your check your bcftools command?
UPDATE: My colleague, Nicholas H., looked into this in more details and it looks like this could be indeed an issue with bcftools command.
He tried to modify the CMD_STR until he was able to get a successful job run. The following worked:
CMD_STR='bcftools view -G -Ou "${in_name[0]}" | bcftools norm -m - -f GRCh38_reference_genome.fa -Ou | bcftools annotate --threads 4 -x INFO,QUAL,ID -Ou | bcftools view -H -Oz > "${in_prefix[0]}".vcf.gz'
dx run swiss-army-knife -iin=file-XYZ -iin=project-XYZ:file-XYZ -iin=project-XYZ:file-XYZ -icmd="$CMD_STR" --instance-type "mem1_ssd1_v2_x2" -y --brief
Overall, the error appears to result from the bcftools command and/or characters that you are providing to the command string. A suggestion could be to wrap the command in single quotes and ${in_xyz} in double quotes.
Please sign in to leave a comment.