Array & instance when filtering for a disease
When filtering for a disease (whether it is self-reported or health recorded, I don't mind) there are numerous categories which when added change the number of participants. For example if looking for “irritable bowel syndrome” I can add it by navigating to:
Health-related outcomes – Hospital inpatient – Record-level access – Hospital diagnosis record – Diagnoses ICD10
or:
Assessment centre – Verbal interview – Medical conditions – Non-cancer illness code, self-reported | instance 0 | array 0
In the assessment centre section for example, there are multiple instance and array numbers. What do these mean? Do I need to add them all manually one by one as filters if I want to filter for a selected disease?
Comments
2 comments
Hi Endrit, for information on instances and arrays, please see https://community.ukbiobank.ac.uk/hc/en-gb/search?utf8=%E2%9C%93&query=what+is+an+instance+index
To find all the available data, you would need to add all the different instance and array columns. The cohort browser is useful for preliminary analyses, but it has limitations like this. For your main analysis you will need to use a different method of accessing the tabular data in the Parquet database. Some researchers extract the data of interest into a csv file using the Table Exporter tool, and then use a JupyterLab instance to process the csv using R or python. Other researchers use a Spark JupyterLab and Spark commands to interact with the Parquet database more directly. There are some template notebooks that might be useful in https://github.com/UK-Biobank/UKB-RAP-Notebooks-Access
Thank you for the help, I will try the table exporter tool
Please sign in to leave a comment.