Question regarding batches

Skh

16 February 2024 10:13
3 comments

I've been looking at the pVCF files of e.g. DRAGEN, and have some questions regarding batches and the VCF file-splits.
I saw in category 187 that there are e.g. different sequencing providers, different shipment batch numbers, etc.
Are there any recommendations from the UK Biobank or the community if any of these (or other QC-fields) should be included as covariates in analyses of the WGS data?

Also, in this context, is there any meaning to the 'b'-numbering of the VCF-files (e.g. ukb24310_c22_b288_v1.vcf.gz, ukb24310_c22_b289_v1.vcf.gz, ...) or are they simply consecutive chunks of variants split for size-reasons?

Comments

3 comments

George F UKB Community team Data Analyst
- 12 March 2024 16:27
To answer the second part of your question, the “b”-numbering pVCFs, list all the variants within a chunk of chromosome co-ordinates (approx 20000 bp per chunk for field 24310). Note that areas of low variability are likely to produce header-only VCFs.
We are currently working on releasing an index file for field 24310, as well as creating a git hub notebook so researchers can create index files from other pVCF fields.

0
Rachael W UKB Community team Data Analyst
- 21 March 2024 11:15
This new post may be relevant: https://community.ukbiobank.ac.uk/hc/en-gb/community/posts/17661209300637-Quality-control-and-metrics

0
Rachael W UKB Community team Data Analyst
- 09 May 2024 09:14
This new post may be relevant https://community.ukbiobank.ac.uk/hc/en-gb/community/posts/18672196405277

0

Please sign in to leave a comment.