Questions about extracting proteomics data

Haoxian Tang

I am currently working on extracting protein data using the code provided in the following GitHub repository: https://github.com/dnanexus/UKB_RAP/blob/main/proteomics/0_extract_phenotype_protein_data.ipynb. However, I encountered an issue during the Spark initialization process, resulting in the error message: "RuntimeError: Java gateway process exited before sending its port number."

I would greatly appreciate any insights or suggestions on how to resolve this issue. Thank you for your assistance!

 

Comments

3 comments

  • Comment author
    Harvey B The helpers that keep the community running smoothly. UKB Community team Data Analyst
    • Edited

    Hi Haoxian,

    I recommend using the UK-Biobank GitHub repository which includes repositories covering how to access UKB data, use Genomics data, and perform workflows within the UKB-RAP. The notebook A108_Constructing-the-Olink-dataset_R or A101_Explore-phenotype-tables_Python are useful for your enquiry;  you can modify the entity parameter to load the Olink data. Due to the size of the Olink tables it may be more efficient to analyse them as a PySpark dataframe and not convert to a pandas dataframe.

    Hope this helps,

    Harvey

    1
  • Comment author
    Haoxian Tang

    Thank you so much for this comprehensive answer! 

    0
  • Comment author
    Qian Chen

    Dear Tang, have you successfully downloaded all the proteomics data?
     

    0

Please sign in to leave a comment.