Working on processing incident events, need help.

I am trying to generate a time to events and events for specific phenotypes based on icd codes. I got results based on two versions. Both show different counts. I also would want to ask how to deal with time to event less than the date of recruitment. This is the code I used. Can someone suggest me changes if any?

 

### Events  

bd1 <- bd1 %>%

 mutate(hf =

      case_when(

       str_detect(p40001_i0, "I500|I501|I509") ~ "hf_40001_icd10",

       str_detect(p40001_i1, "I500|I501|I509") ~ "hf_40001_icd10",

       str_detect(p41202, "I500|I501|I509") ~ "hf_41202_icd10",

       str_detect(p41204, "I500|I501|I509") ~ "hf_41204_icd10",

       str_detect(p41270, "I500|I501|I509") ~ "hf_41270_icd10",

       str_detect(p41201, "I500|I501|I509") ~ "hf_41201_icd10",

       str_detect(p41203, "4280|4281") ~ "hf_41203_icd9",

       str_detect(p41205, "4280|4281") ~ "hf_41205_icd9",

       str_detect(p41271, "4280|4281") ~ "hf_41271_icd9",

       str_detect(p20002_i0, "1076") ~ "hf_20002_self",

       str_detect(p20002_i1, "1076") ~ "hf_20002_self",

       str_detect(p20002_i2, "1076") ~ "hf_20002_self",

       str_detect(p20002_i3, "1076") ~ "hf_20002_self",

       !is.na(p131354) ~ "heart_failure_first_occurence"

)

 )

 

bd1$p131354 = as.Date(bd1$p131354)

bd1$p40000_i0 = as.Date(bd1$p40000_i0)

bd1$p41262_a0 = as.Date(bd1$p41262_a0)

bd1$p41263_a0 = as.Date(bd1$p41263_a0)

bd1$p53_i0_v1 = as.Date(bd1$p53_i0,"%Y-%m-%d")

 

 

##### Time to Events for HF

## Controls

non_hf = bd1[which(bd1$HF == 0),]

non_hf$date_to_hf = paste0('2022-02-16')

non_hf$date_to_hf = as.Date(non_hf$date_to_hf)

 

## Cases

hf = bd1[which(bd1$HF == 1),]

hf$date_to_hf = apply(hf[,c("p131354","p40000_i0","p41262_a0","p41263_a0")], 1, FUN = min, na.rm = TRUE)

hf$date_to_hf = as.Date(hf$date_to_hf)

 

# Remove individuals with occurence of disease date before first date of recruitment

hf1 = hf[which(date_to_hf >= '2006-03-13'),]

 

## Columns Subset & Merge files (nonhf,hf1)

hf1 = hf1[,c("eid","HF","date_to_hf","p53_i0_v1")]

non_hf = non_hf[,c("eid","HF","date_to_hf","p53_i0_v1")]

final_hf = rbind(hf1,non_hf)

 

### Difference in Time based on baseline date of recruitment

final_hf$tot_tty_hf = final_hf$date_to_hf - final_hf$p53_i0_v1

 

 

### Remove individuals with difftime < 0 & Convert days to years

final_hf1 = final_hf[which(final_hf$tot_tty_hf > 0),]

final_hf1$tot_tty_hf = as.numeric(final_hf1$tot_tty_hf)/365.25

 

Comments

8 comments

  • Just thought of asking feedback.

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    41262_a0 holds the date of the first occurrence of the icd 10 code in field 41202_a0.

     

    In most cases, the icd10 code I50 that is in field 41202 will not be in _a0, but in some other position in the array. It is necessary to match the array values, in order to match the correct date to each icd 10 code.

    For more information on array values, see Section 2.5 of the Data Access Guide https://biobank.ndph.ox.ac.uk/~bbdatan/Data_Access_Guide_v3.1.pdf

    0
  • Rachael W ? Thank you so much for the information. I will look into it and get back here if any more queries

    0
  • I have downloaded column "41262_a0" which contains the dates of first occurrence as you mentioned. I was also able to download a column entitled "Diagnoses - main ICD10" with a field title of "p41202". Just to make sure I have everything right, 41262_a0 is the date corresponding to the diagnosis of disease described in "Diagnoses - main ICD10". Is this correct?

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    On showcase (in baskets), Diagnoses - main ICD10 is field 41202, it is arrayed, and it relates to arrayed date field 41262.

     

    On the RAP cohort browser,

    Diagnoses - main ICD10 contains a comma-separated list of several diagnoses, in alphanumeric order of ICD10 code. The first item in the list should be considered as _a0, and the date of it will be in 41262_a0.

    This is not obvious, and I can't find the documentation that explains it, but I have checked for a specific participant with very many diagnoses, and it works.

     

    The (previously-arrayed) field 41202 has been condensed on the RAP cohort browser to make it possible to search for a specific diagnosis without having to search each array column individually.

     

    If you would like to test this for yourself, you can use the record-level data, in

    Hospital Inpatient > Record level access > Hospitalization record > Episode start date

    and

    Hospital Inpatient> Record level access > Hospital diagnosis record > Diagnoses ICD10

    relating the two tables by participant id and instance index (this is not the same "instance" as is used to specify assessment centre visit).

     

    So, it is important to get the array number correct. It is also important not to confuse the Main/Secondary/Summary diagnoses sets of data.

    0
  • Comment author
    Marie Winther-Sørensen

    Rachael W  thank you for explaining the structure of the ICD-10 codes on the platform. 

    Is there a way get all the collumns with the ICD dates _a0 to _a240 (or whatever is the highes) into your cohort? I cannot find a guide for this, although I believe there must be a smarter way than including them in the cohort browser one by one… 

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Marie, the Cohort Browser is useful for exploratory work, but it has limitations of size and functionality.   To use 240 columns, or to select rows in a complicated way, you will need to use a different approach.   For example, you could extract all the columns you need into a csv, and then open a jupyterlab and use R code (or python code if you prefer) to read and manipulate the data in the csv.   This thread has a bit more information on the csv approach https://community.ukbiobank.ac.uk/hc/en-gb/community/posts/19671290524317-How-to-extract-all-the-phenotypes-available-for-a-single-individual 

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    By the way, for new questions, it is more likely to receive an answer if you use a New Post (see button at top of page) rather than adding a comment to an old thread.   This is particularly relevant to threads older than January 2024, as they have been copied over from the old DNAnexus UKB forum, and no longer link alerts to the followers of the thread.

    0

Please sign in to leave a comment.