PLINK2 GWAS fails on large sample (~330k) with SpotInstanceInterruption
Hello everyone,
I’m running a binary phenotype GWAS (~330,000 samples) using PLINK2 on the UKB data. My command is:
plink2 --pfile chr1_qc \
--pheno no.pheno --pheno-name no \
--covar covariates.cov --covar-name age sex GPC1-GPC20 \
--glm hide-covar cols=+a1freq no-firth \
--threads 34 --memory 80000 \
--out gwas_chr1_lonely
The job frequently fails with:
Cause of Failure: The machine running the job was terminated by the cloud.
I noticed:
Running chromosomes 1–20 often fails.
Running chromosomes 21–22 works fine.
My instance: mem1_ssd1_v2_x36
I suspect it might be memory, shuffle, or Spot instance issues, but I’m not sure.
Could anyone advise:
Recommended instance type / memory configuration for ~330k samples GWAS with PLINK2?
Any tips to avoid Spot interruptions or crashes for large chromosomes?
If splitting chromosomes or using smaller batches helps?
Thanks in advance
Comments
1 comment
If you run your jobs using Low priority, it is likely that your jobs fail with SpotInstanceInterruption and wait for another spot instance indefinitely. I recommend to run them at least using Normal priority, so if it fails it can restart on an On-demand instance.
I don't get your data type but it looks like PGEN format. PLINK2 is very efficient when you use PGEN. I use mem3_ssd2_v2_x8 for my QC procedure. It is enough for chr1, run maximum 30 mins, and cheaper than mem1_ssd1_v2_x36 even if you use High priority.
Edit: I didn't realize you are running GLM there. Then, I'm not sure if the instance I recommended is enough :)
Please sign in to leave a comment.