Are 500K WGS (GATK) and 200K WGS cram files the same for the same sample ids even though they have different fieldids?
How much differences do we expect from the GATK crams and Dragen crams in the 500K WGS? Should we use DRAGEN cram?
I'll answer my own question in case anyone is checking on this. The GATK cram files are the same for previous 200K WGS and newer 500K WGS GATK crams with the same EIDs, with the exception of 80+ crams that remapped in late 2022. This is based on file-id numbers that should be unique to each cram.
crams from DRAGEN and GATK pipelines seems mainly differs in processing speed. Illumina Dragen mappers seems much faster. In the variant caller steps DRAGEN seems performs better but that is not my question.
Comments
4 comments
@UK Biobank DA Team?
I'll answer my own question in case anyone is checking on this. The GATK cram files are the same for previous 200K WGS and newer 500K WGS GATK crams with the same EIDs, with the exception of 80+ crams that remapped in late 2022. This is based on file-id numbers that should be unique to each cram.
crams from DRAGEN and GATK pipelines seems mainly differs in processing speed. Illumina Dragen mappers seems much faster. In the variant caller steps DRAGEN seems performs better but that is not my question.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9748128/
Thank you for sharing that answer.
Researchers might find section 3 of the FAQs is also relevant, see https://www.ukbiobank.ac.uk/media/dovbae03/uk-biobank-final-whole-genome-sequencing-release-faqs_v1-0.pdf
Please could you help me to access WGS 200k from malignant melanoma?
Please sign in to leave a comment.