PLINK2 GWAS fails on large sample (~330k) with SpotInstanceInterruption

14 January 2026 03:36
1 comment

Hello everyone,

I’m running a binary phenotype GWAS (~330,000 samples) using PLINK2 on the UKB data. My command is:

plink2 --pfile chr1_qc \
--pheno no.pheno --pheno-name no \
--covar covariates.cov --covar-name age sex GPC1-GPC20 \
--glm hide-covar cols=+a1freq no-firth \
--threads 34 --memory 80000 \
--out gwas_chr1_lonely

The job frequently fails with:
Cause of Failure: The machine running the job was terminated by the cloud.

I noticed:

Running chromosomes 1–20 often fails.

Running chromosomes 21–22 works fine.

My instance: mem1_ssd1_v2_x36

I suspect it might be memory, shuffle, or Spot instance issues, but I’m not sure.

Could anyone advise:

Recommended instance type / memory configuration for ~330k samples GWAS with PLINK2?

Any tips to avoid Spot interruptions or crashes for large chromosomes?

If splitting chromosomes or using smaller batches helps?

Thanks in advance

Comments

1 comment

Ahmet Sayici
- Edited 14 January 2026 10:00
If you run your jobs using Low priority, it is likely that your jobs fail with SpotInstanceInterruption and wait for another spot instance indefinitely. I recommend to run them at least using Normal priority, so if it fails it can restart on an On-demand instance.

I don't get your data type but it looks like PGEN format. PLINK2 is very efficient when you use PGEN. I use mem3_ssd2_v2_x8 for my QC procedure. It is enough for chr1, run maximum 30 mins, and cheaper than mem1_ssd1_v2_x36 even if you use High priority.
Edit: I didn't realize you are running GLM there. Then, I'm not sure if the instance I recommended is enough :)

1

Please sign in to leave a comment.