How can I exclude with participants have specific diagnoses from my cohort using cohort browser?
I want to exclude from in my cohort with participants have spesific canser type or CVD or certain dementia subtypes. I found ICD-10 codes for diseases. On cohort browser, when I use cohort filter in pheno to exclude, I noticed that there are 57 differant fields for ICD-10.
These are:
Health-related outcome
i. Hospital inpatient
1) Record-level access
Hospital Diagnosis Record
Diagnoses ? ICD10
2) Summary Diagnoses
? Diagnoses ? ICD10
? Diagnoses ? main ICD10
? Diagnoses ? secondary ICD10
? External causes ? ICD10
ii. Death register
1) Underlying (primary) cause of death: ICD10 I Instance 0
2) Underlying (primary) cause of death: ICD10 I Instance 1
3) Contributory (secondary) causes of death: ICD10 I Instance 0 I Array 1 - 14
4) Contributory (secondary) causes of death: ICD10 I Instance 1 I Array 1 - 14
iii. Cancer register
Type of cancer: ICD10 I Instance 0 - 21
1) Which one data field of these should I use to exclude all participants with the diseases I mentioned? Which of these data fields is the most comprehensive?
2)In order to exclude all participants with the diseases I mentioned, should I use filter according to ICD-9 codes too?
Comments
6 comments
To find ALL participants that have ever had a particular disease, you would need to use almost ALL the fields.
However, for non-Cancer diseases you might find Category 1712 First Occurrences fields will help you to take a shortcut. These are based on Primary Care, Hospital, Death and Self-report fields, and they include both icd9 and icd10.
If the First Occurrences do not have enough detail, and if you do need to use the many fields listed above, then for each relevant field, you will need every Instance and every Array. For more information on Instances and Arrays, see the Data Access Guide section 2.5 The structure of a main dataset.
The Summary Diagnoses fields are derived from the Hospital inpatient Record-level access, so if you use the Summary Diagnoses fields you will not need to use the Hospital inpatient Record-level access. The Diagnoses - main and Diagnoses - secondary fields are the same information that is in Diagnoses , so you will not need all three fields. A participant will only have a Hospital record if they have been a hospital in-patient.
In general, if somebody has died of a particular ICD 10 disease, it is likely that they will have been to hospital with the same ICD 10 disease, but there could be a few participants that were undiagnosed until after death, so you need the death register fields too.
If the disease is Cancer, then the Cancer Register is most likely to have the data. If the disease is not Cancer, then the Summary Diagnoses fields are most likely to have the data.
Yes, ideally you will need to filter with ICD 9 codes too. (You will find that there are not very many of these, as the ICD 9 codes were only used in a few places several years ago.)
There are some other fields that you might need to check as well. When participants attend a Visit, they answer questions about all the diseases that they have ever had. The self-report data about Cancer is in field 20001 . It uses a different coding, specific to UK Biobank, found in encoding 3 . The self-report data about other diseases is in field 20002, with encoding 6 . For many participants, this data is quite old (pre-2010), so it might not be relevant to your study. For the participants who attended Imaging visits (instance 2 and instance 3) the data is more recent.
For more information on the Death records (including the fact that the extra instance occurs when a single participant gets two death certificates for the same death) see Resource 115559 .
For more information on the Cancer Registry linked data, see Resource 115558.
For more information on the Hospital linked data, see Category 2000 .
If you are interested in Asthma, COPD, Dementia, Renal disease, Motor Neurone disease, Myocardial infactions, Parkinson's disease or Stroke, then see also Category 42.
Hi Rachael W ? ,
Thank you so much for your explanation. It was very helpful for me. I'm trying to create my cohort using the data fields you mentioned. I have another question in this regard. ICD9 codes is not detailed enough for my study. I couldn't find diagnoses that I wanted, and therefore I want to exclude all participant have ICD9 hospital diagnosis record from my cohort. What is the most easy way to do this? I'm wondering from which date beginning to be used ICD-10 codes? If I know that, I can filter according to date of attending assessment centre (data field 53).
Hi {@005t000000BBrFkAAL}?
For information about when and where the ICD 9 codes were used, see https://biobank.ndph.ox.ac.uk/showcase/exinfo.cgi?src=Data_providers_and_dates
All the English hospitals, all the Welsh hospitals, and the Scottish Hospitals after 1996, use ICD 10 codes.
This means that some Scottish participants will have their early hospital records using ICD 9 and their later hospital records using ICD 10.
Are you sure you want to exclude all of the participants with any ICD 9 codes?
Do you mean you want to exclude all the participants with an ICD 9 code of the specific cancer codes, CVD or Dementia?
Most UKB participants have no ICD 9 Hospital Diagnosis records. If you look in field 41271 Summary Diagnoses ICD9, which is in Health-related Outcomes > Hospital Inpatient > Summary Diagnoses, and set the Filter as "Is Not Null" you should see ICD 9 codes for approximately 20390 participants. All the other participants have "Null" in this field 41271. A Null value in this field means Either that the participant did not live in Scotland between 1981 - 1996, Or that they did live in Scotland but did not need to go into hospital at all during that time.
Save this cohort, with a name such as icd9_not_null.
To exclude these icd9_not_null participants from your main cohort, use the Combine Cohort : Subtract .
If you also want to exclude all participants with an ICD9 record in the Cancer Register, you can do a similar cohort using field 40013 Type of Cancer ICD9. However, there are 15 Instances , and you will need them all. You can either create 15 cohorts, and then combine each with your main cohort, or combine all 15 filters to make one cohort.
If you exclude all the participants with any not-null ICD9 hospital record, you will be excluding most of the Scottish participants, just because they went to hospital between 1981 and 1996.
(Notice that English hospitals didn't provide any data at all for 1981 - 1996).
It will not be possible to use data field 53 to filter out the ICD9 codes. Field 53 is the date that a participant signed up to be part of the UKB study, and attended an assessment centre to be measured and to provide a blood sample. Field 53 is nothing to do with the dates of the hospital visits. Data about the hospital visits is provided directly to UKB by the NHS. See Category 2000 and Resource 138483 for more details. Data coding 87 lists the possible ICD9 codes used.
This reference item that might be useful for finding what ICD 9 codes to look for:
Resource 592 https://biobank.ndph.ox.ac.uk/showcase/ukb/auxdata/primarycare_codings.zip provides mappings between different coding systems.
Thank you so much for your answers. I want to exclude all of the participants with any ICD9 codes . When I try using the combine cohort, I've seen warning sign "This cohort cannot be combined a second time" Is there alternatife way?
These filters would get rid of anyone with any ICD9 code for Cancer or any ICD9 code for Hospital Diagnoses:
Please sign in to leave a comment.