I want to define a complex cohort - eg. I want to exclude every person who has an ICD code that contains the word kidney OR renal; and I want to exclude every person who has cancer; and include only men. Cohort Browswer crashes when I try this. Can it be done on the command line /in R or in jupyter lab/ spark jupyter? Could someone please provide an example script? All I can find is guides about how to manipulate cohorts that have already been defined in cohort browser.
Comments
2 comments
You should be able to work with full data using JupyterLab from the very beginning with no need to use Cohort Browser. However, I would recommend to first export relevant columns and maybe apply some simple filters. See this post https://community.dnanexus.com/s/question/0D5t000004SBm0eCAD/query-of-the-week-1-export-phenotypic-data-to-a-file
Starting directly with full data is visible here: https://github.com/dnanexus/OpenBio/blob/master/UKB_notebooks/ukb-rap-pheno-basic.ipynb
I think that this doc page explains it well: https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/using-spark-to-analyze-tabular-data
More specifically, you can use SQL to filter data: https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/using-spark-to-analyze-tabular-data#tips-for-using-sql
In addition, there is this Query of the week post that shows a bit advanced filtering: https://community.dnanexus.com/s/question/0D5t000004SCYMrCAP/query-of-the-week-2-manipulating-optical-coherence-tomography-data-in-pyspark
Thanks Ondrej. i'll work through these. Is there an update for this codeblock that appears in all documentation, but doesn't work any more? :
def fields_for_id(field_id):
from distutils.version import LooseVersion
field_id = str(field_id)
fields = participant.find_fields(name_regex=r'^p{}(_i\d+)?(_a\d+)?$'.format(field_id))
return sorted(fields, key=lambda f: LooseVersion(f.name))
# Returns all field names for a given UKB showcase field id
def field_names_for_id(field_id):
return [f.name for f in fields_for_id(field_id)]
field_names_for_id('31')
ERROR MESSAGE:
/tmp/ipykernel_119/3738977934.py:5: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
return sorted(fields, key=lambda f: LooseVersion(f.name))
Please sign in to leave a comment.