I am trying to figure out how to subset the protein expression dataset to participants that were randomly selected. Is there a field that denotes whether or not a participant with proteomics data is randomly selected vs preselected by the consortium?

I would like to exclude the 10% participants preselected by the consortium for my analysis.  

Comments

14 comments

  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    No, at present there is not a field for that.

     

    Once the exclusivity period has ended on the second tranche of Olink data (due for release in Q4 2023) we will be providing a flag of participants selected by consortium partners. Covid imaging ppts can already be determined using the instancing (visit 3) and/or complementary data fields within showcase, and as such will not be flagged in the same way.

     

     

    1
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Please note that the UK Biobank participants as a whole are already not representative of the general UK population, as they are not a random sample of the population.

    0
  • Comment author
    Tin Oreskovic

    Hello! Just following up regarding the flag indicating that a participant has been selected by consortium partners vs randomly selected: now that the exclusivity period has ended, will this flag be provided?

    Thank you!

    1
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Yes, it is available, see Field 30903,   https://biobank.ndph.ox.ac.uk/showcase/field.cgi?id=30903 .  On the RAP, it is in folder Biological samples > Blood assays > Proteomics > Protein biomarkers .

    1
  • Comment author
    Tin Oreskovic

    Thank you!

    0
  • Comment author
    Caterina Felici

    Hi,

    I am using the data field 41000  to identify Covid imaging ppts and there are participants even when subsetting only for Instance 0 observations. is datafield 41000 the right way of indetifying them?  I am guessing plasma samples of participants from the Covid repeat imaging study were analysed also from instance 0?

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Caterina,

    all the ppts in the Covid imaging study had already attended a baseline visit (instance 0) and an initial imaging visit (instance 2).   They were invited back for a repeat imaging visit (instance 3).

    Only 2096 ppts have non-null values in field 41000, and these are the “covid re-imaging ppts”.   They should all have lots of instance 0 data, and most of them would have provided blood samples at instance 0.

    If this doesn't answer your question, please explain in a bit more detail.

     

     

    0
  • Comment author
    Caterina Felici

    Hi Rachael, 

    Thank you very much for your prompt reply! 

    I am guessing then, that I can use field 41000 only (it is sufficient) to identify participants from The Pharma Proteomics Project that were non-randomly selected  from the Covid re-imaging study?

    Best wishes, 

    Caterina 

     

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Caterina, 

    Yes, field 41000 is sufficient to identify those participants that were included in the PPP (olink) assay project specifically because they were also part of the Covid re-imaging study.

    I just had a look in the cohort browser at these 2096 participants and their values for field 30900 (number of proteins assayed) for each of the instances.   It appears that many of them have all of their i0, i2 and i3 samples assayed (far more than would have occurred at random), which is interesting to me as I had assumed that it would only be the i2 and i3 samples that would have been specially selected.   There are also several of these 2096 participants who do not have any non-null values for field 30900, so it seems that the Covid re-imaging participants were not all included in the PPP project.   I notice that none of the 2096 have i1 samples assayed.  1323 of the 2096 have i0 samples assayed.  1172 of the 2096 have i2 samples assayed.  1123 of the 2096 have i3 samples assayed.  1006 of the 2096 have all of i0, i2 and i3 samples assayed.   

    0
  • Comment author
    Amy Elizabeth Packer

    Hi! I have a question about the random sampling of the subset of 46,595 UK Biobank participants. 

    My question is: was the random sample selected from all 502,414 UKB participants, or was the random selection from the participants remaining afer the initial selection of 5,500 samples pre-selected by the Consortium members? 

    It’s not entirely clear to me which pool of participants the random sample was selected from after reading the ‘Plasma proteomic associations with genetics and health in the UK Biobank’ paper (Sun et al., 2023) or corresponding ‘Supplementary information’ where it describes that “initially 5,500 samples collected from participants during their baseline recruitment visit were pre-selected by the Consortium members. 44,502 further representative participant samples were selected from the UK Biobank (UKB) cohort…”.

    Thanks!

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst
    • Edited

    Hi Amy,

    the random selection was done on the available samples, not on the available participants.

    Most (~80%) UKB participants have provided only one sample.  A few have provided no samples. Some have provided two or more samples, one sample per visit.

    Samples that were selected by the Consortium were not included in the pool for the random selection.   However, where a participant had provided a sample at a baseline visit and another sample at an imaging visit, and their baseline sample was selected by the Consortium, then their imaging sample would be in the pool.

    There are some Resource documents available in the “15 Resources” tab for Proteomics Category 1839 .

    Thank you for using the forum.

    1
  • Comment author
    Neil Wright
    • Edited

    Yes, field 41000 is sufficient to identify those participants that were included in the PPP (olink) assay project specifically because they were also part of the Covid re-imaging study.

    I just had a look in the cohort browser at these 2096 participants and their values for field 30900 (number of proteins assayed) for each of the instances.   It appears that many of them have all of their i0, i2 and i3 samples assayed (far more than would have occurred at random)

     

    Hello,

    I just wanted to follow-up on identifying participants that are part of the UKB-PPP sample because they were randomly selected baseline samples (as opposed to COVID-19 imaging participants). Is there overlap between ‘COVID-19 imaging participants’ and ‘randomly selected’? If so, how can the ‘randomly selected’ be identified?

    Related to this, participants identified by field 41000 have instance0 samples in all batches 0-7, but instances 2 & 3 are confined to batch 7. Looking at the schematic in the ‘NPX calculation and normalization’ section of the supplement of the Sun et al. paper, I'm not clear if either ‘Set 2’ is ALL samples for ‘COVID-19 imaging participants’ (so the samples in batches 0-6, and some in 7, are overlap between random sample and COVID) or ‘Set 2’ is instance 2 and 3 samples only.

    Thank you and regards,

    Neil

    1
  • Comment author
    Amy Elizabeth Packer

    Hi, 

    I have a related question. If I understand correctly, in the UKB-PPP paper (Sun et al., 2023) it mentions that some plates from batches 0–6 were normalized separately. Is there a way to identify these samples?

    Reading the supplementary information it is not clear to me how these could be identified. 

    0
  • Comment author
    Yiran Li

    I have the same question as Amy, how to identify these plates that are normalized separately?

    1

Please sign in to leave a comment.