PLINK2 GWAS fails on large sample (~330k) with SpotInstanceInterruption

Shuxin Liu

Hello everyone,

I’m running a binary phenotype GWAS (~330,000 samples) using PLINK2 on the UKB data. My command is:

plink2 --pfile chr1_qc \
      --pheno no.pheno --pheno-name no \
      --covar covariates.cov --covar-name age sex GPC1-GPC20 \
      --glm hide-covar cols=+a1freq no-firth \
      --threads 34 --memory 80000 \
      --out gwas_chr1_lonely


The job frequently fails with:
Cause of Failure: The machine running the job was terminated by the cloud.

I noticed:

Running chromosomes 1–20 often fails.

Running chromosomes 21–22 works fine.

My instance: mem1_ssd1_v2_x36

I suspect it might be memory, shuffle, or Spot instance issues, but I’m not sure.

Could anyone advise:

Recommended instance type / memory configuration for ~330k samples GWAS with PLINK2?

Any tips to avoid Spot interruptions or crashes for large chromosomes?

If splitting chromosomes or using smaller batches helps?

Thanks in advance

Comments

1 comment

  • Comment author
    Ahmet Sayici
    • Edited

    If you run your jobs using Low priority, it is likely that your jobs fail with SpotInstanceInterruption and wait for another spot instance indefinitely. I recommend to run them at least using Normal priority, so if it fails it can restart on an On-demand instance. 

    I don't get your data type but it looks like PGEN format. PLINK2 is very efficient when you use PGEN. I use mem3_ssd2_v2_x8 for my QC procedure. It is enough for chr1, run maximum 30 mins, and cheaper than mem1_ssd1_v2_x36 even if you use High priority. 

    Edit: I didn't realize you are running GLM there. Then, I'm not sure if the instance I recommended is enough :) 

    1

Please sign in to leave a comment.