Hi dear there Can you help me in extracting genotype data from imputed "bgen files" for certain variants using the 'bigsnpr" R package?

19 December 2022 00:00
5 comments

I'm new on dealing with imputed genotype data and bgen files. I have developed some codes to extract an example SNP in chromosome 22 (22_42524947_C_T) but not sure if my method was correct. I extracted the genotype data for this SNP but this was extracted without participants ids. I used the "sample" file to extract ids, then I combined both columns (the genotype column with the ids column). However, not sure if ids are ordered in the same order of the genotype data. Please have a look at my codes below and let me know if I need to correct anything, please help me by providing a correct syntax if there were some errors below:

install.packages("bigsnpr")

library(bigsnpr)

x<- snp_readBGEN(

bgenfiles= "ukb22828_c22_b0_v3.bgen" ,

backingfile = "ukb22828_c22_b0_v3.bgen.bk" ,

list_snp_id = list("22_42524947_C_T"),

ind_row = NULL,

read_as = c("dosage", "random"),

ncores = 8)

R <- snp_attach(rdsfile = "ukb22828_c22_b0_v3.bgen.bk.rds")

Genotype.data <- as.data.frame(R$genotypes[1:487409, ])

sample <- bigreadr::fread2("ukb22828_c22_b0_v3.sample")[-1, ]

ID.sex <- sample[,c(2,4)]

Final.Genotype <- cbind(ID.sex, Genotype.data)

Comments

5 comments

Chai Fungtammasan DNAnexus Team
- 19 December 2022 17:10
If you can use Python, I can recommend some packages that I successfully use on bgen.

0
Former User of DNAx Community_88
- 20 December 2022 08:33
Hi dear Chair ,

Thank you so much for your replay.
I'm afraid that I have no experience in Python to be honest but many thanks
for spending the time trying to help. I appreciate that.

Hope that someone out here can help me using R. However, if not, I may ask you
how to do that in Python. I will start to learn.

0
Chai Fungtammasan DNAnexus Team
- 20 December 2022 18:51
I have experimented with bgen and bgen_reader package in Python. They work great. Each has pros and cons in what information it extracts for you.

https://bgen-reader.readthedocs.io/en/latest/quickstart.html

https://github.com/jeremymcrae/bgen

The bgen-reader would have more documentation, so you could start from that. I think you don't actually need to learn Python extensively. I found that Python is very helpful for my bioinformaitcs career, but it would take a week for the basic and many months to get good at it. I think just basic Python knowledge would be sufficient here. You may just extract data, save into csv, and run the rest in R if that's your prefer choice.

Here is some step of what you can do if you want to try Python option.

1) launch Jupyterlab and select Python/R kernel. You may use ttyd or cloud_workstation as well. I made this example using cloud_workstation and ipython inside it.

2) user terminal to download the bgen of interest to the work station using dx download <file-id>

3) install bgen-reader using pip install bgen-reader

4) Launch python notebook or ipython. Copy code from bgen_reader into notebook. If it looks confusing, you may use minimal example code I made below.

from pathlib import Path

from bgen_reader import read_bgen

filename="ukb22828_c21_b0_v3.bgen" #replace with file name you want

file_path=Path("/home/dnanexus/"+filename) # the path would depend on where you keep the file

bgen = read_bgen(file_path, verbose=True)

# at this point data is read. you can use example in https://bgen-reader.readthedocs.io/en/latest/quickstart.html to see the data you need.
# for example
print(bgen["variants"].head())

5) Now you would need a bit of Python knowledge to format data in the way you want and save to a file, so that you can use in R.

I will leave this thread open in case anyone has tried your R package.

0
Former User of DNAx Community_88
- 21 December 2022 09:18
Thank you so much dear Chai , this would be so helpful.

Many thanks for sharing this with me.

Kind Regards

0
Former User of DNAx Community_28
- 21 December 2022 18:52
Alternatively, you can extract the snp from the bgen file using bgenix via swiss-army-knife. Then read the single snp bgen with plink and export out as a raw file. The raw file will have the fid iid and the dosage for the snp. which can be read into R or python easily enough.

-Phil

0

Please sign in to leave a comment.