Can I define a cohort in RStudio Workbench instead of using the Cohort Browser?

Po-Wen Ku

I would like to create a cohort from scratch using RStudio. How can I work with the full dataset, for example, by extracting participants with specific ICD codes from the 'HESIN_DIAG' table? Is there any guidance I can refer to? I found relevant information in this post:

https://community.ukbiobank.ac.uk/hc/en-gb/community/posts/16019586962973-how-can-I-define-a-cohort-without-using-the-cohort-browser

However, I am not familiar with JupyterLab, which makes it difficult for me to complete the task within a short time frame.

Thanks

 

Comments

2 comments

  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Po-Wen Ku,

    It is not possible to access the tabular data in the Parquet database by using RStudio.  Accessing the Parquet database requires a Spark instance, and RStudio does not use that.   It is possible to extract the full hesin_diag table using the table exporter tool.  It is possible to create a cohort using a Spark JupyterLab instance. There are some Notebooks that might be useful for this, see https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access .

    0
  • Comment author
    Po-Wen Ku

    Hi Rachael,

    Thanks a lot for your help!

    I was able to use the Spark instance and access the database by following some Notebooks instructions.Using JupyterLab instance is more convenient for me. It also allow me to document and trace my analysis through code rather than relying on the web-interface.

    Really appreciate your support!

     

    0

Please sign in to leave a comment.