Hi, everyone. I have a list of several SNPs of interest and I would like to only get the data for individuals with these SNPs. Can you pls guide me on the step by step procedure on how to do this?
Then next query is, how do I get/link my derived data (i.e. list of individuals with SNPs of interest) to the other parameters (phenotypic and all other details)? The data folders are separated per category, so the specific steps to be able to do this would be very helpful for me.
My main goal is to have a complete data set only for the selected SNPs, derived from the bulk data that I have.
Thanks very much in advance to those willing to help.
I'm interested too. I'm sure it will be useful for everyone. I hope to see a step-by-step (reproducible) instruction that everyone can repeat.
And I am sure that such instructions should indicate how to combine phenotypic data and data from the genome. For example, I propose to take phenotic data from any csv file and enrich them with genetic data from Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ for the 8th chromosome and for positions 127 735 434-127 741 434
If it is in Python, it will be just fine
0
Permanently deleted user
Esphie f you had something else in mind, then write and I will delete my addition and ofomlyu it as a separate question to the team.
if you had something else in mind, then write and I will delete my addition
@Alex Shemy, I believe @Madhusmita Rout? did a similar thing before, as per previous post. I have also sent her a message and just waiting for response. As it is, yes we do need step-by-step instructions. Really hope we get it sorted soonest.
Thanks very much, @Ondrej Klempir? ! I am studying all these references now. Hope you don't mind should I have further clarifications/questions, here or via message. Many thanks again- such great help!
Hello @Esphie Fojas?, sure thing! I am also personally interested in this topic and I am wondering whether a step by step tutorial on this (of course a simplified version of the whole thing) would be a good topic for my next Query of the week this Friday, no promises...
I would be delighted if you could provide a step-by-step example by Friday.
I would appreciate it if you could also run the code yourself and verify its functionality.
I've noticed that there are many codes and examples available, but when I try to run them for my project, they don't work and I'm unable to obtain the desired result. In short, I'm looking for a reproducible example.
Comments
12 comments
I'm interested too. I'm sure it will be useful for everyone. I hope to see a step-by-step (reproducible) instruction that everyone can repeat.
And I am sure that such instructions should indicate how to combine phenotypic data and data from the genome. For example, I propose to take phenotic data from any csv file and enrich them with genetic data from Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ for the 8th chromosome and for positions 127 735 434-127 741 434
If it is in Python, it will be just fine
Esphie f you had something else in mind, then write and I will delete my addition and ofomlyu it as a separate question to the team.
if you had something else in mind, then write and I will delete my addition
@Alex Shemy, I believe @Madhusmita Rout? did a similar thing before, as per previous post. I have also sent her a message and just waiting for response. As it is, yes we do need step-by-step instructions. Really hope we get it sorted soonest.
A) For applications, when you would like to combine geno and pheno data, there is a guide on how is the filename convention for Bulk data looks like https://dnanexus.gitbook.io/uk-biobank-rap/getting-started/working-with-ukb-data#filename-conventions
i.e. typically each file contains eid in its filename so that would be the primary key to merge the geno vs. pheno datasets.
B) Also I am sharing some recent in-depth materials that show the comprehensive geno/pheno pipelines:
https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/gwas-ex
https://github.com/dnanexus/UKB_RAP/tree/main/end_to_end_gwas_phewas
C) For some other use cases, just a pointer, another point of view might be to iterate over a exported list of eids (cohort of interest) to download and process Bulk files. I had a conversation around Imaging data on RAP: https://community.dnanexus.com/s/question/0D5t000004EtXLYCA3/is-there-a-way-to-extract-the-bulk-imaging-data-using-the-spark-jupyter-notebook
D) Many of the geno/pheno were ingested to a Database. You can directly access and query the tables: https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/using-spark-to-analyze-tabular-data#accessing-the-database-directly-using-sql
Another option is to use Hail. Sharing here an example notebook which performs GWAS (tips on Create pheno Table is part of it): https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/gwas.ipynb
How to filter Genomic Data with Hail: Chromosomes and Positions
https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/filter_chrpos.ipynb
Thanks very much, @Ondrej Klempir? ! I am studying all these references now. Hope you don't mind should I have further clarifications/questions, here or via message. Many thanks again- such great help!
Hello @Esphie Fojas?, sure thing! I am also personally interested in this topic and I am wondering whether a step by step tutorial on this (of course a simplified version of the whole thing) would be a good topic for my next Query of the week this Friday, no promises...
Hello @Ondrej Klempir?, thank you for your response.
I would be delighted if you could provide a step-by-step example by Friday.
I would appreciate it if you could also run the code yourself and verify its functionality.
I've noticed that there are many codes and examples available, but when I try to run them for my project, they don't work and I'm unable to obtain the desired result. In short, I'm looking for a reproducible example.
Thank you in advance.
Very much looking forward indeed, @Ondrej Klempir?. Thanks very much in advance!
One documention on workflow you proposed sits here: https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/gwas.ipynb
It is a python Hail GWAS tutorial, demonstrating how to combine pheno and geno data for GWAS analysis.
You may of course include more steps, e.g. filtering for specific pos: https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/filter_chrpos.ipynb
Please sign in to leave a comment.