I would like to exclude related individuals from my cohort. When working on UK Biobank data previously, I used the pre-computed relatedness file listing the pairwise KING kinship coefficients of all individuals related up to the third degree, as described here: https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=531.
I can't seem to find any comparable file on the RAP. I could use the information from Data Field 22021 to filter for individuals with "No kinship found", but I would prefer to exclude only one of two in a related pair. Data fields 22011-22012 don't seem to cover relatedness up to the 3rd degree.
Could anyone point me in the right direction? If no pre-computed data is available, is it possible to use the KING relationship inference software on the RAP?
Until recently, the Relatedness file was one of the few items (apart from Returns) that was not available on the RAP. The Good News is that it is now present, and can be found in folder Bulk > Genotype Results > Genotype Calls as file ukb_rel.dat
If you cannot see the file in that location in your project, then it might be because your project was dispensed before the file was uploaded.
Please have a look at the date of your currently-dispensed dataset. If it is later than April 2023, please wait until the dispensing issue is resolved and then re-dispense your project.
Please do Not try to re-dispense your project today, or your current data will become unusable.
Normal Tier-1/2/3 projects could request the rel file in a basket and upload it to the RAP project storage, but this option is not available for student-tier projects.
is it possible to get an update on this same issue? I cannot see the file ukb_rel.dat in the mentioned location.
What is the current update on accessing relatedness information (ukb_rel.dat) following the recent announcement of accessing UKB data ? Is it possible to re-dispense project data at this point?
I would be grateful to have some pointers on this issue.
Researchers who were using RAP projects before April 2024 should still be able to use their RAP projects, and should be able to dispense new RAP projects.
It is still not possible to Refresh a RAP project. If your current RAP project was dispensed before 30th November 2023, then you can access new data (version 18 from November 2023) by dispensing a new RAP project.
Once the new project is dispensed, it will be possible to copy any code or derived results from the old project to the new project. Once you are happy that the new project is working correctly, you can delete the old project.
If your current RAP project was dispensed after 30th November 2023, then dispensing a new project would not find any new data.
I’ve found the ukb_rel.dat file, and my objective is to use the kinship coefficients to exclude related participants from my cohort. However, I’m unsure how to proceed with this file to achieve that goal.
I would greatly appreciate step-by-step instructions, as I want to ensure I understand each part of the process thoroughly to carry out the analysis correctly.
Any help you can offer would be incredibly valuable. Thank you in advance for your time and advice!
If the objective is to exclude all related participants, then Field 22021 can be used to select only the 339000 ppts with no relatives in UKB. Notice that this excludes both of any two participants who are related to each other. To create a cohort that excludes all participants with any relatives in UKB, use the Cohort Browser and Filter with Field 22021 like this:
If the objective is to include only one of any pair of related participants, then the ukb_rel.dat file will be needed.
To use the ukb_rel.dat file, one way is to start a JupyterLab session from the Tools tab, copy the ukb_rel.dat file into the JupyterLab instance storage using a $_ terminal and the command dx download Bulk/Genotype Results/Genotype calls/ukb_rel.dat
Then open a Python or R kernel, and use Python or R code to manipulate the data. Remember to save any results back to your main project file storage using dx upload commands.
The ukb_rel.dat file is a 5 column table giving a pairwise listing of related individual pseudo-IDs accompanied by the values:
HetHet : the fraction of markers for which the pair both have a heterozygous genotype;
IBS0 : the fraction of markers for which the pair shares zero alleles;
Kinship : estimate of the kinship coefficient for pair based on the set of markers used in the kinship inference.
In any pair where one or more of the participants has withdrawn, both pseudo-IDs are replaced by negative numbers.
Close relatives can then be inferred fairly reliably based on the estimated kinship coefficients as follows:
Comments
10 comments
Until recently, the Relatedness file was one of the few items (apart from Returns) that was not available on the RAP. The Good News is that it is now present, and can be found in folder Bulk > Genotype Results > Genotype Calls as file ukb_rel.dat
If you cannot see the file in that location in your project, then it might be because your project was dispensed before the file was uploaded.
The Bad News is that it is not currently possible to re-dispense any project, see this announcement https://community.dnanexus.com/s/question/0D582000004afbhCAA/data-dispensalrefresh-temporarily-unavailable .
Please have a look at the date of your currently-dispensed dataset. If it is later than April 2023, please wait until the dispensing issue is resolved and then re-dispense your project.
Please do Not try to re-dispense your project today, or your current data will become unusable.
Normal Tier-1/2/3 projects could request the rel file in a basket and upload it to the RAP project storage, but this option is not available for student-tier projects.
Note that some of the related individuals are in a related triplet rather than a pair, and a few are in even larger groupings.
Hello,
is it possible to get an update on this same issue? I cannot see the file ukb_rel.dat in the mentioned location.
What is the current update on accessing relatedness information (ukb_rel.dat) following the recent announcement of accessing UKB data ? Is it possible to re-dispense project data at this point?
I would be grateful to have some pointers on this issue.
Thank you
Regards
Researchers who were using RAP projects before April 2024 should still be able to use their RAP projects, and should be able to dispense new RAP projects.
It is still not possible to Refresh a RAP project. If your current RAP project was dispensed before 30th November 2023, then you can access new data (version 18 from November 2023) by dispensing a new RAP project.
Once the new project is dispensed, it will be possible to copy any code or derived results from the old project to the new project. Once you are happy that the new project is working correctly, you can delete the old project.
If your current RAP project was dispensed after 30th November 2023, then dispensing a new project would not find any new data.
Hi Rachael,
My project was created this month, however I cannot see ukb_rel.dat under Bulk > Genotype Results > Genotype Calls
The data preview under genotype calls does not show any information. I would appreciate any help on this matter.
Thank you very much.
Hi Ciaran,
the ukb_rel.dat file is a Bulk file, not part of the cohort browser's Parquet database.
I suggest you have a look in the Bulk folder from the main page of your project, before you open the cohort browser.
For more details on the Bulk data and the Tabular Parquet data, see this page https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/accessing-data/accessing-phenotypic-data .
To use the genotype calls together with the rel file, you will probably want to start a JupyterLab, not the cohort browser. To combine the bulk data with data from the Parquet database, you will probably want to export some of the Parquet data into a csv. There are a few forum posts about ways to do this. See for example this thread about extracting phenotypes, https://community.ukbiobank.ac.uk/hc/en-gb/community/posts/19671290524317-How-to-extract-all-the-phenotypes-available-for-a-single-individual .
Thank you for using the forum.
Thank you very much Rachael.
I’ve found the ukb_rel.dat file, and my objective is to use the kinship coefficients to exclude related participants from my cohort. However, I’m unsure how to proceed with this file to achieve that goal.
I would greatly appreciate step-by-step instructions, as I want to ensure I understand each part of the process thoroughly to carry out the analysis correctly.
Any help you can offer would be incredibly valuable. Thank you in advance for your time and advice!
Hi Huabing,
If the objective is to exclude all related participants, then Field 22021 can be used to select only the 339000 ppts with no relatives in UKB. Notice that this excludes both of any two participants who are related to each other. To create a cohort that excludes all participants with any relatives in UKB, use the Cohort Browser and Filter with Field 22021 like this:
If the objective is to include only one of any pair of related participants, then the ukb_rel.dat file will be needed.
To use the ukb_rel.dat file, one way is to start a JupyterLab session from the Tools tab, copy the ukb_rel.dat file into the JupyterLab instance storage using a $_ terminal and the command dx download Bulk/Genotype Results/Genotype calls/ukb_rel.dat
Then open a Python or R kernel, and use Python or R code to manipulate the data. Remember to save any results back to your main project file storage using dx upload commands.
The ukb_rel.dat file is a 5 column table giving a pairwise listing of related individual pseudo-IDs accompanied by the values:
In any pair where one or more of the participants has withdrawn, both pseudo-IDs are replaced by negative numbers.
Close relatives can then be inferred fairly reliably based on the estimated kinship coefficients as follows:
> 0.354 : duplicate/MZ twin
0.177 - 0.354 : 1st-degree
0.0884 - 0.177 : 2nd-degree
0.0442 - 0.0884 : 3rd-degree
See KING Relatedness kinship inference for more details.
The relatedness file was generated from all participants with genotype information and has been filtered to include individuals with 3rd-degree kinship or closer. Around 3% of UKB participants are missing as insufficient DNA was available. For more information on the genetic relatedness file please see Bycroft et al, Section 3.7 of https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-018-0579-z/MediaObjects/41586_2018_579_MOESM1_ESM.pdf .
Thank you for using the forum.
Thank you very much Rachael!
Please sign in to leave a comment.