Hi everyone,
I am running Regenie form the RAP Command line. I got tis error "set file : [ukb23158_500k_OQFE.sets.txt.gz] ERROR: unknown chromosome code in set list file". I am using the files already available on UKBiobank RAP.
I kind of managed to solve this error saving the file as tab separated file and removing the chr 24. I re-runned the Step 2 and now I got this error " WARNING: Detected 423 sets with variants not in genetic data or annotation files.
WARNING: Detected 18525 sets with only unknown variants (these are ignored).
+report on burden input files written to [breast_cancer_assoc_burden_masks_report.txt]
-keeping only specified sets
ERROR: no set left to include in analysis". Does anyone know the cause of this error? Below the code I used:
dx run swiss-army-knife \
-iin="/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ukb23158_c22_b0_v1.bed" \
-iin="/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ukb23158_c22_b0_v1.bim" \
-iin="/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ukb23158_c22_b0_v1.fam" \
I needed to re-write this file, restricting to autosomes for Regenie to work. I used a jupyter notebook on the RAP, using the code below, then used the new set file in my Regenie command.
import pandas as pd
# read set file, only keep autosomes
#! dx download "path_to_helper_files/ukb23158_500k_OQFE.sets.txt.gz" # can only read once per session, otherwise need to delete the file
#! dx upload ukb23158_500k_OQFE_autosomes.sets.txt.gz --path /path_to_somewhere_in_your_project/ukb23158_500k_OQFE_autosomes.sets.txt.gz # can only run once per session
Comments
2 comments
I kind of managed to solve this error saving the file as tab separated file and removing the chr 24. I re-runned the Step 2 and now I got this error " WARNING: Detected 423 sets with variants not in genetic data or annotation files.
WARNING: Detected 18525 sets with only unknown variants (these are ignored).
+report on burden input files written to [breast_cancer_assoc_burden_masks_report.txt]
-keeping only specified sets
ERROR: no set left to include in analysis". Does anyone know the cause of this error? Below the code I used:
dx run swiss-army-knife \
-iin="/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ukb23158_c22_b0_v1.bed" \
-iin="/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ukb23158_c22_b0_v1.bim" \
-iin="/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - final release/ukb23158_c22_b0_v1.fam" \
-iin="/Project_BreastCancer/Data/Regenie/breast_cancer_step1_pred.list" \
-iin="/Project_BreastCancer/Data/Regenie/breast_cancer_step1_1.loco" \
-iin="/Project_BreastCancer/Data/Pheno_Data_participant_wes_bc.tsv" \
-iin="/Project_BreastCancer/Data/tmp/ukb23158_500k_OQFE_tab_noY.sets.tsv.gz"
-iin="/Project_BreastCancer/Data/tmp/ukb23158_500k_OQFE_tab.annotations.txt.gz"
-iin="/Project_BreastCancer/Data/tmp/custom_masks.txt" \
-iin="/Project_BreastCancer/Data/tmp/ListaGeniBC_Genturis_annotUKBB_chr22.txt" \
-icmd="regenie --step 2 \
--bt \
--lowmem \
--check-burden-files \
--pred breast_cancer_step1_pred.list \
--bed ukb23158_c22_b0_v1 \
--phenoFile Pheno_Data_participant_wes_bc.tsv \
--covarFile Pheno_Data_participant_wes_bc.tsv \
--phenoCol breast_cancer \
--covarColList age,sex,genetic_PC_array{1:10} \
--set-list ukb23158_500k_OQFE_tab_noY.sets.tsv.gz \
--anno-file ukb23158_500k_OQFE_tab.annotations.txt.gz \
--mask-def custom_masks.txt \
--extract-sets ListaGeniBC_Genturis_annotUKBB_chr22.txt \
--firth --approx \
--firth-se \
--aaf-bins 0.05,0.01,0.001 \
--bsize 1000 \
--out breast_cancer_assoc_burden \
--threads 16 --gz" \
--tag="Step 2" \
--instance-type "mem1_ssd1_v2_x16" \
--destination="/Project_BreastCancer/Data/Regenie/" --brief --yes
This is the head of the file ListaGeniBC_Genturis_annotUKBB_chr22.txt:
LZTR1(ENSG00000099949)
SMARCB1(ENSG00000099956)
CHEK2(ENSG00000183765)
NF2(ENSG00000186575)
PDGFB(ENSG00000100311)
Thanks for the help!
I needed to re-write this file, restricting to autosomes for Regenie to work. I used a jupyter notebook on the RAP, using the code below, then used the new set file in my Regenie command.
import pandas as pd
# read set file, only keep autosomes
#! dx download "path_to_helper_files/ukb23158_500k_OQFE.sets.txt.gz" # can only read once per session, otherwise need to delete the file
set_col_names = ['Gene', 'Chromosome', 'Position', 'Set']
autosome_range = list(range(1, 23))
sets = (
pd.read_csv('ukb23158_500k_OQFE.sets.txt.gz',
compression='gzip', delimiter='\t', names=set_col_names)
.query('Chromosome in @autosome_range')
)
# write autosomes set file
sets.to_csv('ukb23158_500k_OQFE_autosomes.sets.txt.gz', compression='gzip', sep='\t',
header=False, index=False)
#! dx upload ukb23158_500k_OQFE_autosomes.sets.txt.gz --path /path_to_somewhere_in_your_project/ukb23158_500k_OQFE_autosomes.sets.txt.gz # can only run once per session
Please sign in to leave a comment.