Speed/throttling when submitting 1000s of jobs
Hi
Please could you let me know if there is any constraint against queing up e.g. 1000 or 2000 jobs to be done as lowpriority workers become available? the jobs are accessing a few small similar files from my platform, as well as different large ones but I don't think that that is the problem as they aren't even starting.
When I ran just c. 100 jobs, all seemed to start within a few hours and completed quickly.
With 1000 queued, I've noticed that very few have even started. Am I being throttled because of some flag raised, or is it is just the time of day and patience required?
Comments
4 comments
Let me know if the below is accurate or just slop.
kthxbi
There is no rule that stops you from submitting thousands of jobs, but two built-in throttles explain why only a few of your 1 000 low-priority jobs are actually starting:
Practical guidance for huge submissions
Submitting in waves—e.g. 500 at a time—lets you spot errors early and keeps the monitor view usable .
Bottom line: the platform happily accepts thousands of queued jobs, but only the first ~100 per user can run simultaneously, and low-priority work is further gated by spot-instance availability. Raise your worker quota or batch submissions more coarsely if you need faster turn-around.
Thanks this is a reasonable answer
Hi, I am aiming to submit a batch of 1000 jobs as you've spoken about here and just wondered if you could confirm that jobs which exceed the 100 limit and remain in the queue aren't charged?
Short answer: yes, that’s correct 👍
On the UK Biobank RAP (DNAnexus), you are only charged once a job actually starts running on an instance. Jobs that exceed the concurrent running limit (for example, you submit 1000 jobs but only ~100 are allowed to run at once) will simply sit in the queue, and queued jobs do not incur any compute charges.
A few concrete points to make it crisp:
While a job is in
queued/waitingstate and has not been assigned an instance, there is zero compute cost.Charges begin only when the job transitions to
runningand an AWS instance is allocated.Submitting 1000 jobs is fine. The platform throttles execution automatically; the excess jobs just wait their turn.
A queued job does not spin up disks, containers, or temporary storage. Those only appear once the job starts.
If a job starts running and then retries or restarts (e.g. spot eviction, failure), each running attempt is billed, but again only for the time it is actually running.
This is exactly why large scatter-style submissions (hundreds or thousands of jobs) are a normal and supported pattern on RAP. You can safely fire off the whole batch without worrying about being charged for the backlog sitting patiently in the wings 🐦⬛.
If you want, I can also show you:
dx describe/dx find jobs, orPlease sign in to leave a comment.