Selecting SNPs of Interest and Linking These (Selected SNPs) to Rest of Data

21 March 2023 00:00
12 comments

Hi, everyone. I have a list of several SNPs of interest and I would like to only get the data for individuals with these SNPs. Can you pls guide me on the step by step procedure on how to do this? Then next query is, how do I get/link my derived data (i.e. list of individuals with SNPs of interest) to the other parameters (phenotypic and all other details)? The data folders are separated per category, so the specific steps to be able to do this would be very helpful for me. My main goal is to have a complete data set only for the selected SNPs, derived from the bulk data that I have. Thanks very much in advance to those willing to help.

Comments

12 comments

Permanently deleted user
- 21 March 2023 22:07
I'm interested too. I'm sure it will be useful for everyone. I hope to see a step-by-step (reproducible) instruction that everyone can repeat.
And I am sure that such instructions should indicate how to combine phenotypic data and data from the genome. For example, I propose to take phenotic data from any csv file and enrich them with genetic data from Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ for the 8th chromosome and for positions 127 735 434-127 741 434
If it is in Python, it will be just fine

0
Permanently deleted user
- 21 March 2023 22:35
Esphie f you had something else in mind, then write and I will delete my addition and ofomlyu it as a separate question to the team.
if you had something else in mind, then write and I will delete my addition

0
Former User of DNAx Community_83
- 21 March 2023 23:20
@Alex Shemy, I believe @Madhusmita Rout? did a similar thing before, as per previous post. I have also sent her a message and just waiting for response. As it is, yes we do need step-by-step instructions. Really hope we get it sorted soonest.

0
Ondrej Klempir DNAnexus Team
- 27 March 2023 13:09
A) For applications, when you would like to combine geno and pheno data, there is a guide on how is the filename convention for Bulk data looks like https://dnanexus.gitbook.io/uk-biobank-rap/getting-started/working-with-ukb-data#filename-conventions
i.e. typically each file contains eid in its filename so that would be the primary key to merge the geno vs. pheno datasets.

B) Also I am sharing some recent in-depth materials that show the comprehensive geno/pheno pipelines:
https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/gwas-ex
https://github.com/dnanexus/UKB_RAP/tree/main/end_to_end_gwas_phewas

C) For some other use cases, just a pointer, another point of view might be to iterate over a exported list of eids (cohort of interest) to download and process Bulk files. I had a conversation around Imaging data on RAP: https://community.dnanexus.com/s/question/0D5t000004EtXLYCA3/is-there-a-way-to-extract-the-bulk-imaging-data-using-the-spark-jupyter-notebook

D) Many of the geno/pheno were ingested to a Database. You can directly access and query the tables: https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/using-spark-to-analyze-tabular-data#accessing-the-database-directly-using-sql

0
Ondrej Klempir DNAnexus Team
- 27 March 2023 13:14
Another option is to use Hail. Sharing here an example notebook which performs GWAS (tips on Create pheno Table is part of it): https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/gwas.ipynb

0
Ondrej Klempir DNAnexus Team
- 27 March 2023 13:19
How to filter Genomic Data with Hail: Chromosomes and Positions
https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/filter_chrpos.ipynb

0
Former User of DNAx Community_83
- 29 March 2023 08:47
Thanks very much, @Ondrej Klempir? ! I am studying all these references now. Hope you don't mind should I have further clarifications/questions, here or via message. Many thanks again- such great help!

0
Ondrej Klempir DNAnexus Team
- 29 March 2023 08:55
Hello @Esphie Fojas?, sure thing! I am also personally interested in this topic and I am wondering whether a step by step tutorial on this (of course a simplified version of the whole thing) would be a good topic for my next Query of the week this Friday, no promises...

0
Permanently deleted user
- 30 March 2023 00:16
Hello @Ondrej Klempir?, thank you for your response.
I would be delighted if you could provide a step-by-step example by Friday.
I would appreciate it if you could also run the code yourself and verify its functionality.
I've noticed that there are many codes and examples available, but when I try to run them for my project, they don't work and I'm unable to obtain the desired result. In short, I'm looking for a reproducible example.
Thank you in advance.

0
Former User of DNAx Community_83
- 30 March 2023 10:27
Very much looking forward indeed, @Ondrej Klempir?. Thanks very much in advance!

0
Ondrej Klempir DNAnexus Team
- 31 March 2023 12:01
One documention on workflow you proposed sits here: https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/gwas.ipynb

It is a python Hail GWAS tutorial, demonstrating how to combine pheno and geno data for GWAS analysis.

0
Ondrej Klempir DNAnexus Team
- 31 March 2023 12:02
You may of course include more steps, e.g. filtering for specific pos: https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/filter_chrpos.ipynb

0

Please sign in to leave a comment.