Another Spark issue with writing a Hail matrix:
mt.write(url)
fails after some time (and money) spent:
"ExecutorLostFailure (executor 23 exited caused by one of the running tasks) Reason: Remote RPC client disassociated."
Attached full error message.
What is the causing this?
Thanks,
Or
Users can save the spark event logs to later view the UI through Spark?s history server. Running bash /cluster/dnax/bin/collect_log.sh <output_directory_name> in the terminal of the jupyter+spark app will download the spark event log to the project. We can then follow the instructions described here: https://documentation.dnanexus.com/developer/apps/developing-spark-apps#using-spark-history-server to view the UI
Comments
6 comments
One more detail: I used high priority nodes only.
Most likely, it's out of memory issue. it would be more clear is we can see the event log.
You may find this documentation helpful: https://documentation.dnanexus.com/science/using-hail-to-analyze-genomic-data#guidance-on-scaling-with-hail
Thanks Chai, I will try to set the memory and cores in the builder section, and use more memory.
How do I get the event log?
Users can save the spark event logs to later view the UI through Spark?s history server. Running bash /cluster/dnax/bin/collect_log.sh <output_directory_name> in the terminal of the jupyter+spark app will download the spark event log to the project. We can then follow the instructions described here: https://documentation.dnanexus.com/developer/apps/developing-spark-apps#using-spark-history-server to view the UI
thanks!
@Chai Fungtammasan? has recently published the following post about Hail troubleshooting for UKB data:
https://community.dnanexus.com/s/question/0D5t000004AflSiCAJ/hail-troubleshooting-for-ukb-data
Please sign in to leave a comment.