SNP filtering for genomic regions

Hi all,  

I am following a tutorial developed by Oliver Gray (many tks Oliver)(https://github.com/UK-Biobank/SNP-filtering) which guides to obtain a list of SNPs. However, not all SNPs are available in rs name. So I figure out how to obtain the SNP by their chromosome position. I guess the same tutorial also explains, but it is missing how to fill an external file (genomic_regions.txt) to select the SNPs. For example, if I am interested in two SNPs, one in the first chromosome and the other in the second, such as 1:182600492 and 2:66523433, how should I fill the txt file?   

Example 1:

1:182600492

2:66523433

Example 2:

chr1:182600492,chr2:66523433

Example 3:

Something else. 

 

Thanks for your help. 

Ian

 

PS this discussion started on a previous post (https://community.ukbiobank.ac.uk/hc/en-gb/community/posts/18669657313437-How-do-I-extract-allele-combinations-at-specific-SNPs-using-Jupyterlab), but Rachel suggested that I start a new issue as the topic is a little shifted.  

 

Comments

3 comments

  • Comment author
    Lea K. Data Analyst The helpers that keep the community running smoothly. UKB Community team

    Hi Ian,

    Thank you for reaching out. We are working on enhancing the documentation and tool for filtering SNPs in the genotyping data.  

    If you want to filter the data based on the genomic positions the genomic_regions.txt will have the following structure:

     1    30000000    35000000    R1

     4    60000000    62000000    R2

    The text file should include four columns that are tab separated, including chromosome ( e.g. 1, 15, X), region start (in base pair coordinates), region end (also in base pair coordinates) and a user-selected identifier for the region. Each chromosome region of interest should be included on a separate line. Hope this helps.

     

    Thank you for using the Community forum.

     

     

     

    0
  • Comment author
    Ian Meneghel Danilevicz

    Thank you Lea!

    So, for example, if I am interested in the rs17400325 and rs9369062, which I can find in GWAS catalog (GWAS Catalog), the structure would be something like this: 

     

    2    177701185    177701185   rs17400325 

    6    38469527    38469527    rs9369062

     

    as the first one is on chromosome 2 and base pair location 177701185, the second is on chromosome 6 and base pair location 38469527.     

    Best

     

     

    0
  • Comment author
    Lea K. Data Analyst The helpers that keep the community running smoothly. UKB Community team

    Hi Ian,

    That's the correct structure. However, please note that the base positions in the genotyping data are in GRCh37 coordinates so the start and end variant positions will be different. 

    Hope this helps. Thank you for using the Community Forum.

    0

Please sign in to leave a comment.