Issue Retrieving Fields Using dxdata in Spark Cluster
Dear team,
I am currently using the Spark Cluster on the UKB-RAP and encountered an issue when trying to retrieve fields using dxdata.
I have already defined the entity and the list of fields, and I’m using the following code to extract the data into a spark dataframe:
df = participant.retrieve_fields(names=field_names, engine=dxdata.connect())
This approach has worked successfully in the past. However, I now receive the following error message:
ValueError: ca_certs is needed when cert_reqs is not ssl.CERT_NONE
I noticed that the default Spark cluster version in JupyterLab was recently updated to v2.5.0 as of July 2. Could this issue be related to the updated environment?
I would appreciate any suggestions.
Thank you very much.
Comments
9 comments
Hi Po-Wen,
there was a temporary problem with an expired SSL certificate, see https://status.dnanexus.com/ on June 30th. Please try it again now.
If there are still issues, please clear cache and cookies and try again. Please also make sure you are logged into the UKB platform via https://ukbiobank.dnanexus.com/login.
If this doesn't help, please contact the platform providers, DNAnexus, via the Help tab > Contact Support in the UKB-RAP, or directly by email to ukbiobank-support@dnanexus.com .
Just as a further update: I still get the same error while running under the same conditions as above, even after clearing cache and cookies.
Hi Erik,
thank you for the update. Please contact DNAnexus with details.
Hi all,
Do you have any updates about this issue?
I also get the same error and l'll contact DNAnexus.
Thanks!
Clair
Hello Clair,
I also reached out to them and they provided a workaround while they are fixing it by adding:
That worked for me.
I'm getting the same error – has it been fixed yet? Where do I add in the dxdata.connect(dialect=”hive+pyspark")?
Got it to work by adding here:
# Pull down the fields we need
df = participant.retrieve_fields(names=field_names, coding_values="replace", engine = dxdata.connect(dialect="hive+pyspark"))
Hey, this example is all over the place in the tutorials and the first step needed to work with spark on this platform, and seems to be not working still unless the workaround hidden in this community post is used?
The problem still persists. There seems to be a bug with the way you're passing cert_reqs=CERT_NONE.
Please sign in to leave a comment.