Low scratch storage space when recoding PLINK files to VCF

Hi,

I got an error message when recoding PLINK files to VCF files. I wonder if there is anyway to get around the issue.

Thanks in advance.

Here is the error message:

PLINK v2.00a3.1LM 64-bit Intel (19 May 2022) www.cog-genomics.org/plink/2.0/

(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3

Logging to ukb22418_v2_b0.log.

Options in effect:

--bfile ukb22418_v2_b0_merged

--export vcf id-paste=iid

--out ukb22418_v2_b0

 

Start time: Sat May 20 20:09:42 2023

7653 MiB RAM detected; reserving 3826 MiB for main workspace.

Using up to 4 compute threads.

488377 samples (264719 females, 223412 males, 246 ambiguous; 488377 founders)

loaded from ukb22418_v2_b0_merged.fam.

784256 variants loaded from ukb22418_v2_b0_merged.bim.

Note: No phenotype data present.

CPU: 23% (4 cores) * Memory: 1282/7653MB * Storage: 130/323GB * Net: 156?/1?MBps

May 20, 2023 10:12 PM

CPU: 2% (4 cores) * Memory: 1304/7653MB * Storage: 265/323GB * Net: 0?/0?MBps

May 20, 2023 10:22 PM

--export vcf to ukb22418_v2_b0.vcf ... 0%0%1%1%2%2%3%3%4%4%5%5%6%6%7%7%8%8%9%9%10%10%11%11%12%12%13%13%14%14%15%

May 20, 2023 10:26 PM

Error: File write failure: No space left on device.

End time: Sat May 20 20:26:15 2023

Low scratch storage space

 

Comments

5 comments

  • Comment author
    Chai Fungtammasan DNAnexus Team

    Have you tried to increase the storage? We cover this in overview webinar and "How to Run Tools Already Available on the UK Biobank Research Analysis Platform"

    https://www.youtube.com/watch?v=uT_jD1Ey3Fk

    https://youtu.be/U8QZAGwnUm0?t=738

    0
  • Thanks for your reply. Do you mean slide 24 of the webinar about instance type? My initial submission is mem1_ssd1_v2_x16, which failed with the 'Low scratch storage space' error message. Then I reran the same job using mem3_ssd1_v2_x96, but it still failed with exactly the same error message. Not sure what would help.

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    Since the error is on storage, I recommend you to focus on storage (ssd1) part rather than memory(mem). For example, you can use ssd2 or hdd1/ssd3 which are roughly double or quadruple the storage of ssd1 and increase the number of core which is the multiplication factor. You can see the storage size for each instance type in the rate card. https://20779781.fs1.hubspotusercontent-na1.net/hubfs/20779781/Product%20Team%20Folder/Rate%20Cards/BiobankResearchAnalysisPlatform_Rate%20Card_Current.pdf

     

    It might worth for you to calculate how much storage you need before you run too. For sure, it must be able to handle the input and output files, but you need to spare for intermediated results too.

     

    On thing worth checking though, did you see that your instance that have more storage can run longer (since there are more space to do the work until the error occurs)? I just want to make sure that the error you got isn't in the subjob or something else.

     

    0
  • Thank you. I tried ssd2 and it still failed. In the end, I had to recode data for one chromsome at a time, which finally worked. Thanks!

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    Sounds good. Thank for sharing the solution with community!

    0

Please sign in to leave a comment.