May I ask how I should find the specific genetic data I need in WGS and WES compressed data, since it is impossible to open and check, such as regarding Prostate cancer and Thyroid cancer, do I need to apply the tools . Whether there are relevant codes.

Comments

2 comments

  • Comment author
    Anastazie Sedlakova DNAnexus Team

    Hello, here is the Jupyter notebook which is extracting 41270 field - Summary ICD10 diagnosis.

    Here is my proposed workflow to extract lets say thyroid cancer (ICD10 C73 code):

    1. Extract 41270 field and select participants based on your criteria (e.g. whether participant has or do not have C73 ). Output of this work would be list of EIDS to select
    2. WES:
      1. You can adapt this QC WDL script selecting BGEN files (located in /Bulk/Exome sequences/Population level exome OQFE variants, BGEN format - final release/) and adding --keep parameter with the files containing EIDs to select
    3. WGS:
      1. Currently, results are in pVCF format only, so you may want to use bcftools. I do not have personal experience with this approach, you can look discussion on the Biostars.
      2. Since there is one pVCF per region of each chromosome, you may want to parallelize your work by writing WDL script. Here is documentation on how to use WDL on UKB RAP.
    0
  • Thanks!

    What genetic data is contained in the Whole genome cram files,and is there any code analysis associated with it.

    Kind Regard.?

    0

Please sign in to leave a comment.