If I want to obtain the first diagnosis date of patients with ICD code K70.0 (Alcoholic fatty liver) from the "Summary Diagnoses," which online tool should I use?
If I want to obtain the first diagnosis date of patients with ICD code K70.0 (Alcoholic fatty liver) from the "Summary Diagnoses," which online tool should I use? There are too many diagnosis date from Array0-258, how i can find the first diagnosis date of K70.0? Thanks a lot
Comments
9 comments
Hi Shiqi,
Table-exporter is a helpful tool to extract the date of ICD-10 diagnosis. You may find the A110_Export_participant_data.Rmd notebook helpful for using table-exporter on RStudio. I hope this helps.
Thank you for using the community forum.
Hi Shiqi,
have you found a solution for your problem?
I'm facing the same issue as I want to get the date of a specific ICD code. I've tried what Lea has suggested, but I want to find a way to match a specific ICD code (from data field p41270) with the corresponding date (which can be in any array of p41280).
Thank you!
Hi Giovanni,
After extracting the data from fields 41270 and 41280, you can match a specific ICD code with its corresponding date using the array structure. The matching is based on the same eid, instance and array index, ensuring that the ICD code aligns with the correct date.
Hope this helps.
Hi Lea,
thanks a lot for your answer. I'm sorry to keep bothering you but I still don't understand. When I look at the data using the Cohort browser, field 41270 just has a single variable containing all the ICD codes of each patient, while field 41280 has multiple arrays (from 0 to 258). Same thing when I open the data using R.
How can I match ICD codes from field 41270 (which is just a long list of variables) with the dates from each array of field 41280?
Thank you.
Hi Giovanni,
After extracting data from field 41270, the different ICD-10 code instances for each participant are stored in a single column, separated by a pipe (“|”). The order of these diagnoses corresponds to the array number for the date of diagnosis from field 41280. Therefore, you can split the diagnoses into separate columns and match them to the corresponding array numbers.
For instance, if you extracted data from field 41270 using table-exporter, with the column name “Diagnoses - ICD10”, you can find the following R code useful (with `table_exporter_data` being the name of your dataframe):
library(tidyverse)
# Get the number of instances - this will correspond to the number of columns for each instance
max_instances <- table_exporter_data$`Diagnoses - ICD10` %>%
replace_na("") %>%
str_count("\\|") %>% # Count the number of pipes
max(na.rm = TRUE) + 1
# Perform the separation based on the pipe operator
icd10_instances <- table_exporter_data %>%
separate(
`Diagnoses - ICD10`,
into = paste0("Diagnoses - ICD10 | Array ", 0:(max_splits - 1)),
sep = "\\|",
fill = "right"
)
Additionally, you can extract this from the hesin record tables. The hesin_diag table can be useful. You can extract this data using table exporter, by replacing the entity “participant” to “hesin_diag”.
I hope this helps. Thank you for using the community forum.
Hi Lea,
Just a quick clarification.
(a) When the data from field 41270 is retrieved with data-exporter under the entity “participant”, you do indeed get the different ICD-10 code instances for each participant stored in a single column, separated by a pipe (“|”). These are in alphanumeric order and can be separated into arrays.
(b) However, when you retrieve the same field (41270) under the entity “hesin_diag”, you get a one row per participant per code *and* the order of these codes per participant is not alphanumeric.
Given this, when trying to merge with the diagnoses dates (41280) which are in array format, one needs to know which order the date arrays take. Do they match with the data from (a) the order of the piped field or with which are alphanumeric or (b) the order of the non piped data from hesin_diag which are not?
Thanks
Hi Maria,
Sorry for the delayed response. When trying to merge the diagnoses from field 41280, the order matches the order of the piped 41270 field.
The data in the hesin_diag table follows a different convention to the fields extracted in the participant table. Resource 138483 includes information on how the data in this table was derived.
Hope this helps. Thank you for using the Community Forum.
Dear UK Biobank Support,
I would like to confirm how Field 41270 maps to Field 41280. If a participant’s Field 41270 is exported as a pipe-delimited string such as: “A100|B100|C100 does” this mean that the corresponding dates in Field 41280 are: A100 → p41280_a0 B100 → p41280_a1 C100 → p41280_a2 In other words, does the mapping start from array index a0? Thank you very much for your help.
Hi Chen, please find a great explanation from Lea posted a year ago.
https://community.ukbiobank.ac.uk/hc/en-gb/community/posts/23567724254621/comments/24561432285725
Hope this helps. Thank you for using the Community Forum.
Please sign in to leave a comment.