Why is HAIL SampleQC script stalling in JupyterLab?
Hello,
We are trying to apply our in-house hail sample qc script to the WES data on the UKB RAP.
We are beginning to test its use on the UKB RAP by running it on only one chromosome pVCF (starting small with chr21 and chrY). In this case, the code we are running is:
`python hail_sample_qc.py file:///opt/notebooks/LCR-hg38-noHLA.interval_list 'test' file:///opt/notebooks/with_chr_noMT_dbsnp144.b38.vcf.gz 'file:///mnt/project/Bulk/Exome sequences/Population level exome OQFE variants, pVCF format - final release/ukb23157_cY_b*_v1.vcf.gz' --coding-intervals file:///opt/notebooks/xgen_plus_spikein.GRCh38.bed`
where the LCR file specifies low-complexity regions, and the with_chr_noMT_dbsnp144.b38.vcf.gz file is our reference VCF.
This script has been used successfully in the past in our institute's computing cluster on locally stored VCF files.
As shown below, hail seems to initiate properly and even successfully completes several initial steps. However, at the start "Stage 3", the script seems to stall and seemingly gets stuck. I have also attached the log files associated with two attempts at running the script where the script seems to stall.
For reference, I am working on a Spark JupyterLab cluster with 96 cores, 187.5GB of total memory, and 9600GB of total storage. Each chromosome contains the WES data of roughly 470,000 participants.
Does anyone have suggestions for what may be causing this stalling? I appreciate the help.
Comments
2 comments
When you run this, is it before or after our recent Jupyterlab update (happened in late July)? Just want to make sure we rule out the older version issue first.
Hello , have you solved this question? I have met the same error while running the script
Please sign in to leave a comment.