Questions about extracting proteomics data
I am currently working on extracting protein data using the code provided in the following GitHub repository: https://github.com/dnanexus/UKB_RAP/blob/main/proteomics/0_extract_phenotype_protein_data.ipynb. However, I encountered an issue during the Spark initialization process, resulting in the error message: "RuntimeError: Java gateway process exited before sending its port number."
I would greatly appreciate any insights or suggestions on how to resolve this issue. Thank you for your assistance!
Comments
3 comments
Hi Haoxian,
I recommend using the UK-Biobank GitHub repository which includes repositories covering how to access UKB data, use Genomics data, and perform workflows within the UKB-RAP. The notebook A108_Constructing-the-Olink-dataset_R or A101_Explore-phenotype-tables_Python are useful for your enquiry; you can modify the entity parameter to load the Olink data. Due to the size of the Olink tables it may be more efficient to analyse them as a PySpark dataframe and not convert to a pandas dataframe.
Hope this helps,
Harvey
Thank you so much for this comprehensive answer!
Dear Tang, have you successfully downloaded all the proteomics data?
Please sign in to leave a comment.