Hi dear there Can you help me in extracting genotype data from imputed "bgen files" for certain variants using the 'bigsnpr" R package?

I'm new on dealing with imputed genotype data and bgen files. I have developed some codes to extract an example SNP in chromosome 22 (22_42524947_C_T) but not sure if my method was correct. I extracted the genotype data for this SNP but this was extracted without participants ids. I used the "sample" file to extract ids, then I combined both columns (the genotype column with the ids column). However, not sure if ids are ordered in the same order of the genotype data. Please have a look at my codes below and let me know if I need to correct anything, please help me by providing a correct syntax if there were some errors below:

 

 

 install.packages("bigsnpr")

 library(bigsnpr)

 

 

  x<- snp_readBGEN(

  bgenfiles= "ukb22828_c22_b0_v3.bgen" ,

  backingfile = "ukb22828_c22_b0_v3.bgen.bk" ,

  list_snp_id = list("22_42524947_C_T"),

  ind_row = NULL,

  read_as = c("dosage", "random"),

  ncores = 8)

 

  R <- snp_attach(rdsfile = "ukb22828_c22_b0_v3.bgen.bk.rds")

 

  Genotype.data <- as.data.frame(R$genotypes[1:487409, ])

 

  sample <- bigreadr::fread2("ukb22828_c22_b0_v3.sample")[-1, ]

 

  ID.sex <- sample[,c(2,4)]

 

  Final.Genotype <- cbind(ID.sex, Genotype.data)

 

 

 

 

 

 

Comments

5 comments

  • Comment author
    Chai Fungtammasan DNAnexus Team

    If you can use Python, I can recommend some packages that I successfully use on bgen.

    0
  • Comment author
    Former User of DNAx Community_88

    Hi dear Chair ,

     

    Thank you so much for your replay.

    I'm afraid that I have no experience in Python to be honest but many thanks

    for spending the time trying to help. I appreciate that.

     

    Hope that someone out here can help me using R. However, if not, I may ask you

    how to do that in Python. I will start to learn.

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    I have experimented with bgen and bgen_reader package in Python. They work great. Each has pros and cons in what information it extracts for you. 

     

    https://bgen-reader.readthedocs.io/en/latest/quickstart.html

     

    https://github.com/jeremymcrae/bgen

     

    The bgen-reader would have more documentation, so you could start from that. I think you don't actually need to learn Python extensively. I found that Python is very helpful for my bioinformaitcs career, but it would take a week for the basic and many months to get good at it. I think just basic Python knowledge would be sufficient here. You may just extract data, save into csv, and run the rest in R if that's your prefer choice. 

     

     

    Here is some step of what you can do if you want to try Python option.

     

    1) launch Jupyterlab and select Python/R kernel. You may use ttyd or cloud_workstation as well. I made this example using cloud_workstation and ipython inside it.

     

    2) user terminal to download the bgen of interest to the work station using dx download <file-id>

     

    3) install bgen-reader using pip install bgen-reader

     

    4) Launch python notebook or ipython. Copy code from bgen_reader into notebook. If it looks confusing, you may use minimal example code I made below.

     

    from pathlib import Path

     

    from bgen_reader import read_bgen

     

    filename="ukb22828_c21_b0_v3.bgen" #replace with file name you want

     

    file_path=Path("/home/dnanexus/"+filename) # the path would depend on where you keep the file

     

    bgen = read_bgen(file_path, verbose=True)

     

     

    # at this point data is read. you can use example in https://bgen-reader.readthedocs.io/en/latest/quickstart.html to see the data you need.

    # for example

    print(bgen["variants"].head())

     

     

    5) Now you would need a bit of Python knowledge to format data in the way you want and save to a file, so that you can use in R. 

     

    I will leave this thread open in case anyone has tried your R package. 

     

    0
  • Comment author
    Former User of DNAx Community_88

    Thank you so much dear Chai , this would be so helpful.

     

    Many thanks for sharing this with me.

     

    Kind Regards

    0
  • Comment author
    Former User of DNAx Community_28

    Alternatively, you can extract the snp from the bgen file using bgenix via swiss-army-knife. Then read the single snp bgen with plink and export out as a raw file. The raw file will have the fid iid and the dosage for the snp. which can be read into R or python easily enough.

     

    -Phil

     

     

     

    0

Please sign in to leave a comment.