Questions about extracting proteomics data

31 October 2024 15:37
3 comments

I am currently working on extracting protein data using the code provided in the following GitHub repository: https://github.com/dnanexus/UKB_RAP/blob/main/proteomics/0_extract_phenotype_protein_data.ipynb. However, I encountered an issue during the Spark initialization process, resulting in the error message: "RuntimeError: Java gateway process exited before sending its port number."

I would greatly appreciate any insights or suggestions on how to resolve this issue. Thank you for your assistance!

Comments

3 comments

Harvey B UKB Community team Data Analyst
- Edited 05 November 2024 14:39
Hi Haoxian,
I recommend using the UK-Biobank GitHub repository which includes repositories covering how to access UKB data, use Genomics data, and perform workflows within the UKB-RAP. The notebook A108_Constructing-the-Olink-dataset_R or A101_Explore-phenotype-tables_Python are useful for your enquiry; you can modify the entity parameter to load the Olink data. Due to the size of the Olink tables it may be more efficient to analyse them as a PySpark dataframe and not convert to a pandas dataframe.
Hope this helps,
Harvey

1
Haoxian Tang
- 08 November 2024 15:51
Thank you so much for this comprehensive answer!

0
Qian Chen
- 24 December 2024 06:07
Dear Tang, have you successfully downloaded all the proteomics data?

0

Please sign in to leave a comment.