Avoid data loos upon SpotInstanceInterruption
I am running a loop with samtools across multiple samples and noticed my job was restarted (SpotInstanceInterruption), meaning that all temporary output files are lost since these are uploaded to my project after all samples have been processed. I have two questions:
1) Is there a way to avoid SpotInstanceInterruption. Do I have to run on priority=high/normal instead of low?2) Looking at my script below, what would be the command to upload the temporary output to the project for each iteration of the loop? And how do I delete the temporary output to avoid it bein uploaded again when the script is finished?
script.sh:
while read f; do \
eid=$( basename "$f" .cram | cut -d_ -f1 ) \
echo "Sample: ${eid}" \
cram="/mnt/project/${f}.cram" \
out="${eid}.processed.bam"
samtools view ..... -o "${out}" "${cram}" \
Add some upload command here? \
done < "/mnt/project/files.txt"
My local script that calls the above script:
dx run swiss-army-knife \
-iin="${project}/script.sh" \
-icmd="bash script.sh '$i'" \
--instance-type mem1_ssd1_v2_x8 \
--destination "${project}/Output" \
--priority low -y --brief
Comments
1 comment
I think it would be more efficient if you parallelised your swiss-army-knife ie
You may want to create an app https://documentation.dnanexus.com/developer/apps/intro-to-building-apps
Please sign in to leave a comment.