I am trying to figure out how to subset the protein expression dataset to participants that were randomly selected. Is there a field that denotes whether or not a participant with proteomics data is randomly selected vs preselected by the consortium?
Once the exclusivity period has ended on the second tranche of Olink data (due for release in Q4 2023) we will be providing a flag of participants selected by consortium partners. Covid imaging ppts can already be determined using the instancing (visit 3) and/or complementary data fields within showcase, and as such will not be flagged in the same way.
Please note that the UK Biobank participants as a whole are already not representative of the general UK population, as they are not a random sample of the population.
Hello! Just following up regarding the flag indicating that a participant has been selected by consortium partners vs randomly selected: now that the exclusivity period has ended, will this flag be provided?
I am using the data field 41000 to identify Covid imaging ppts and there are participants even when subsetting only for Instance 0 observations. is datafield 41000 the right way of indetifying them? I am guessing plasma samples of participants from the Covid repeat imaging study were analysed also from instance 0?
all the ppts in the Covid imaging study had already attended a baseline visit (instance 0) and an initial imaging visit (instance 2). They were invited back for a repeat imaging visit (instance 3).
Only 2096 ppts have non-null values in field 41000, and these are the “covid re-imaging ppts”. They should all have lots of instance 0 data, and most of them would have provided blood samples at instance 0.
If this doesn't answer your question, please explain in a bit more detail.
I am guessing then, that I can use field 41000 only (it is sufficient) to identify participants from The Pharma Proteomics Project that were non-randomly selected from the Covid re-imaging study?
Yes, field 41000 is sufficient to identify those participants that were included in the PPP (olink) assay project specifically because they were also part of the Covid re-imaging study.
I just had a look in the cohort browser at these 2096 participants and their values for field 30900 (number of proteins assayed) for each of the instances. It appears that many of them have all of their i0, i2 and i3 samples assayed (far more than would have occurred at random), which is interesting to me as I had assumed that it would only be the i2 and i3 samples that would have been specially selected. There are also several of these 2096 participants who do not have any non-null values for field 30900, so it seems that the Covid re-imaging participants were not all included in the PPP project. I notice that none of the 2096 have i1 samples assayed. 1323 of the 2096 have i0 samples assayed. 1172 of the 2096 have i2 samples assayed. 1123 of the 2096 have i3 samples assayed. 1006 of the 2096 have all of i0, i2 and i3 samples assayed.
Hi! I have a question about the random sampling of the subset of 46,595 UK Biobank participants.
My question is: was the random sample selected from all 502,414 UKB participants, or was the random selection from the participants remaining afer the initial selection of 5,500 samples pre-selected by the Consortium members?
It’s not entirely clear to me which pool of participants the random sample was selected from after reading the ‘Plasma proteomic associations with genetics and health in the UK Biobank’ paper (Sun et al., 2023) or corresponding ‘Supplementary information’ where it describes that “initially 5,500 samples collected from participants during their baseline recruitment visit were pre-selected by the Consortium members. 44,502 further representative participant samples were selected from the UK Biobank (UKB) cohort…”.
the random selection was done on the available samples, not on the available participants.
Most (~80%) UKB participants have provided only one sample. A few have provided no samples. Some have provided two or more samples, one sample per visit.
Samples that were selected by the Consortium were not included in the pool for the random selection. However, where a participant had provided a sample at a baseline visit and another sample at an imaging visit, and their baseline sample was selected by the Consortium, then their imaging sample would be in the pool.
There are some Resource documents available in the “15 Resources” tab for Proteomics Category 1839 .
Yes, field 41000 is sufficient to identify those participants that were included in the PPP (olink) assay project specifically because they were also part of the Covid re-imaging study.
I just had a look in the cohort browser at these 2096 participants and their values for field 30900 (number of proteins assayed) for each of the instances. It appears that many of them have all of their i0, i2 and i3 samples assayed (far more than would have occurred at random)
Hello,
I just wanted to follow-up on identifying participants that are part of the UKB-PPP sample because they were randomly selected baseline samples (as opposed to COVID-19 imaging participants). Is there overlap between ‘COVID-19 imaging participants’ and ‘randomly selected’? If so, how can the ‘randomly selected’ be identified?
Related to this, participants identified by field 41000 have instance0 samples in all batches 0-7, but instances 2 & 3 are confined to batch 7. Looking at the schematic in the ‘NPX calculation and normalization’ section of the supplement of the Sun et al. paper, I'm not clear if either ‘Set 2’ is ALL samples for ‘COVID-19 imaging participants’ (so the samples in batches 0-6, and some in 7, are overlap between random sample and COVID) or ‘Set 2’ is instance 2 and 3 samples only.
I have a related question. If I understand correctly, in the UKB-PPP paper (Sun et al., 2023) it mentions that some plates from batches 0–6 were normalized separately. Is there a way to identify these samples?
Reading the supplementary information it is not clear to me how these could be identified.
Comments
14 comments
No, at present there is not a field for that.
Once the exclusivity period has ended on the second tranche of Olink data (due for release in Q4 2023) we will be providing a flag of participants selected by consortium partners. Covid imaging ppts can already be determined using the instancing (visit 3) and/or complementary data fields within showcase, and as such will not be flagged in the same way.
Please note that the UK Biobank participants as a whole are already not representative of the general UK population, as they are not a random sample of the population.
Hello! Just following up regarding the flag indicating that a participant has been selected by consortium partners vs randomly selected: now that the exclusivity period has ended, will this flag be provided?
Thank you!
Yes, it is available, see Field 30903, https://biobank.ndph.ox.ac.uk/showcase/field.cgi?id=30903 . On the RAP, it is in folder Biological samples > Blood assays > Proteomics > Protein biomarkers .
Thank you!
Hi,
I am using the data field 41000 to identify Covid imaging ppts and there are participants even when subsetting only for Instance 0 observations. is datafield 41000 the right way of indetifying them? I am guessing plasma samples of participants from the Covid repeat imaging study were analysed also from instance 0?
Hi Caterina,
all the ppts in the Covid imaging study had already attended a baseline visit (instance 0) and an initial imaging visit (instance 2). They were invited back for a repeat imaging visit (instance 3).
Only 2096 ppts have non-null values in field 41000, and these are the “covid re-imaging ppts”. They should all have lots of instance 0 data, and most of them would have provided blood samples at instance 0.
If this doesn't answer your question, please explain in a bit more detail.
Hi Rachael,
Thank you very much for your prompt reply!
I am guessing then, that I can use field 41000 only (it is sufficient) to identify participants from The Pharma Proteomics Project that were non-randomly selected from the Covid re-imaging study?
Best wishes,
Caterina
Hi Caterina,
Yes, field 41000 is sufficient to identify those participants that were included in the PPP (olink) assay project specifically because they were also part of the Covid re-imaging study.
I just had a look in the cohort browser at these 2096 participants and their values for field 30900 (number of proteins assayed) for each of the instances. It appears that many of them have all of their i0, i2 and i3 samples assayed (far more than would have occurred at random), which is interesting to me as I had assumed that it would only be the i2 and i3 samples that would have been specially selected. There are also several of these 2096 participants who do not have any non-null values for field 30900, so it seems that the Covid re-imaging participants were not all included in the PPP project. I notice that none of the 2096 have i1 samples assayed. 1323 of the 2096 have i0 samples assayed. 1172 of the 2096 have i2 samples assayed. 1123 of the 2096 have i3 samples assayed. 1006 of the 2096 have all of i0, i2 and i3 samples assayed.
Hi! I have a question about the random sampling of the subset of 46,595 UK Biobank participants.
My question is: was the random sample selected from all 502,414 UKB participants, or was the random selection from the participants remaining afer the initial selection of 5,500 samples pre-selected by the Consortium members?
It’s not entirely clear to me which pool of participants the random sample was selected from after reading the ‘Plasma proteomic associations with genetics and health in the UK Biobank’ paper (Sun et al., 2023) or corresponding ‘Supplementary information’ where it describes that “initially 5,500 samples collected from participants during their baseline recruitment visit were pre-selected by the Consortium members. 44,502 further representative participant samples were selected from the UK Biobank (UKB) cohort…”.
Thanks!
Hi Amy,
the random selection was done on the available samples, not on the available participants.
Most (~80%) UKB participants have provided only one sample. A few have provided no samples. Some have provided two or more samples, one sample per visit.
Samples that were selected by the Consortium were not included in the pool for the random selection. However, where a participant had provided a sample at a baseline visit and another sample at an imaging visit, and their baseline sample was selected by the Consortium, then their imaging sample would be in the pool.
There are some Resource documents available in the “15 Resources” tab for Proteomics Category 1839 .
Thank you for using the forum.
Hello,
I just wanted to follow-up on identifying participants that are part of the UKB-PPP sample because they were randomly selected baseline samples (as opposed to COVID-19 imaging participants). Is there overlap between ‘COVID-19 imaging participants’ and ‘randomly selected’? If so, how can the ‘randomly selected’ be identified?
Related to this, participants identified by field 41000 have instance0 samples in all batches 0-7, but instances 2 & 3 are confined to batch 7. Looking at the schematic in the ‘NPX calculation and normalization’ section of the supplement of the Sun et al. paper, I'm not clear if either ‘Set 2’ is ALL samples for ‘COVID-19 imaging participants’ (so the samples in batches 0-6, and some in 7, are overlap between random sample and COVID) or ‘Set 2’ is instance 2 and 3 samples only.
Thank you and regards,
Neil
Hi,
I have a related question. If I understand correctly, in the UKB-PPP paper (Sun et al., 2023) it mentions that some plates from batches 0–6 were normalized separately. Is there a way to identify these samples?
Reading the supplementary information it is not clear to me how these could be identified.
I have the same question as Amy, how to identify these plates that are normalized separately?
Please sign in to leave a comment.