Hi everyone, I am running Regenie form the RAP Command line. I got tis error "set file : [ukb23158_500k_OQFE.sets.txt.gz] ERROR: unknown chromosome code in set list file". I am using the files already available on UKBiobank RAP.

Comments

2 comments

  • I kind of managed to solve this error saving the file as tab separated file and removing the chr 24. I re-runned the Step 2 and now I got this error " WARNING: Detected 423 sets with variants not in genetic data or annotation files.

    WARNING: Detected 18525 sets with only unknown variants (these are ignored).

    +report on burden input files written to [breast_cancer_assoc_burden_masks_report.txt]

    -keeping only specified sets

    ERROR: no set left to include in analysis". Does anyone know the cause of this error? Below the code I used:

     

    dx run swiss-army-knife \

    -iin="/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ukb23158_c22_b0_v1.bed" \

    -iin="/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ukb23158_c22_b0_v1.bim" \

    -iin="/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ukb23158_c22_b0_v1.fam" \

    -iin="/Project_BreastCancer/Data/Regenie/breast_cancer_step1_pred.list" \

    -iin="/Project_BreastCancer/Data/Regenie/breast_cancer_step1_1.loco" \

    -iin="/Project_BreastCancer/Data/Pheno_Data_participant_wes_bc.tsv" \

    -iin="/Project_BreastCancer/Data/tmp/ukb23158_500k_OQFE_tab_noY.sets.tsv.gz"

    -iin="/Project_BreastCancer/Data/tmp/ukb23158_500k_OQFE_tab.annotations.txt.gz"

    -iin="/Project_BreastCancer/Data/tmp/custom_masks.txt" \

    -iin="/Project_BreastCancer/Data/tmp/ListaGeniBC_Genturis_annotUKBB_chr22.txt" \

    -icmd="regenie --step 2 \

    --bt \

    --lowmem \

    --check-burden-files \

    --pred breast_cancer_step1_pred.list \

    --bed ukb23158_c22_b0_v1 \

    --phenoFile Pheno_Data_participant_wes_bc.tsv \

    --covarFile Pheno_Data_participant_wes_bc.tsv \

    --phenoCol breast_cancer \

    --covarColList age,sex,genetic_PC_array{1:10} \

    --set-list ukb23158_500k_OQFE_tab_noY.sets.tsv.gz \

    --anno-file ukb23158_500k_OQFE_tab.annotations.txt.gz \

    --mask-def custom_masks.txt \

    --extract-sets ListaGeniBC_Genturis_annotUKBB_chr22.txt \

    --firth --approx \

    --firth-se \

    --aaf-bins 0.05,0.01,0.001 \

    --bsize 1000 \

    --out breast_cancer_assoc_burden \

    --threads 16 --gz" \

    --tag="Step 2" \

    --instance-type "mem1_ssd1_v2_x16" \

    --destination="/Project_BreastCancer/Data/Regenie/" --brief --yes

     

    This is the head of the file ListaGeniBC_Genturis_annotUKBB_chr22.txt:

    LZTR1(ENSG00000099949)

    SMARCB1(ENSG00000099956)

    CHEK2(ENSG00000183765)

    NF2(ENSG00000186575)

    PDGFB(ENSG00000100311)

     

    Thanks for the help!

     

    0
  • Comment author
    Jovia Nierenberg

    I needed to re-write this file, restricting to autosomes for Regenie to work. I used a jupyter notebook on the RAP, using the code below, then used the new set file in my Regenie command. 

    import pandas as pd

    # read set file, only keep autosomes

    #! dx download "path_to_helper_files/ukb23158_500k_OQFE.sets.txt.gz" # can only read once per session, otherwise need to delete the file

    set_col_names = ['Gene', 'Chromosome', 'Position', 'Set']

    autosome_range = list(range(1, 23))

    sets = (

    pd.read_csv('ukb23158_500k_OQFE.sets.txt.gz',

    compression='gzip', delimiter='\t', names=set_col_names)

    .query('Chromosome in @autosome_range')

    )

    # write autosomes set file

    sets.to_csv('ukb23158_500k_OQFE_autosomes.sets.txt.gz', compression='gzip', sep='\t',

    header=False, index=False)

    #! dx upload ukb23158_500k_OQFE_autosomes.sets.txt.gz --path /path_to_somewhere_in_your_project/ukb23158_500k_OQFE_autosomes.sets.txt.gz # can only run once per session

    1

Please sign in to leave a comment.