Extracting and saving a specific datafield for all participant in Jupyterlab,
I am interested in the Data-Field 30160 (https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=30160). How do I extract and save/reload data for this data-field for all participants using Jupyterlab.
To add some context, I followed the tutorials here https://github.com/dnanexus/OpenBio/blob/master/dxdata/getting_started_with_dxdata.ipynb and here https://github.com/dnanexus/OpenBio/blob/master/UKB_notebooks/ukb-rap-pheno-basic.ipynb, but both give no details for data extraction given the datafield - field_id was used instead.
I could extract the data actually using RAP by creating cohort, however this gives access to only 30k participants.
In summary, I do I query the data loaded in JupyterLab for a specific data-field?
Thank you!
Comments
3 comments
The extra piece of information on how to get the field id from data field is here:
https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/using-spark-to-analyze-tabular-data
See Image here:
Then follow the instructions exactly as in:
ttps://github.com/dnanexus/OpenBio/blob/master/UKB_notebooks/ukb-rap-pheno-basic.ipynb
The notebook https://github.com/dnanexus/OpenBio/blob/master/UKB_notebooks/ukb-rap-pheno-basic.ipynb has a function that also retireves the ID from the data-field too. In case this helps someone else.
You may also find this thread interesting. It's for exporting data into a file, but some tips with notebook and dx extract_dataset might be useful.
https://community.dnanexus.com/s/question/0D5t000004SBm0eCAD/query-of-the-week-1-export-phenotypic-data-to-a-file
Please sign in to leave a comment.