Burden test from regenie in UKB WES data; the result does not contain Alleles or BETA and SE
Hi,
I was trying to burden test in WES using dx and swiss army knife using the following code:
for i in {1..22}; do
run_regenie_burden="regenie \
--step 2 --pred fit_snps_chr${i}_pred.list \
--bgen ukb23159_c${i}_b0_v1.bgen \
--ref-first \
--sample ukb23159_c${i}_b0_v1.sample \
--phenoFile lipid_pheno_sep13_with_labels_caucasian_norealtive.csv \
--covarFile covarite_sep23_age_sex_20pc.csv \
--set-list ukb23158_500k_OQFE.sets.txt.gz \
--anno-file ukb23158_500k_OQFE.annotations.txt.gz \
--mask-def ldl_custom_masks.txt \
--aaf-bins 0.01,0.001 --nauto 23 \
--bsize 200 --extract-sets nmr_gene_sets_ready.txt \
--write-mask-snplist \
--vc-tests skato,acato-full \
--out metabolimics_skato_acato_chr${i}"
dx run swiss-army-knife -iin="${bgen_prefix}/${data_field_bgen}_c${i}_b0_v1.bgen" \
-iin="${bgen_prefix}/${data_field_bgen}_c${i}_b0_v1.sample" \
-iin="${bgen_prefix}/${data_field_bgen}_c${i}_b0_v1.bgen.bgi" \
-iin="${phenotype_file}" \
-iin="${covariate_file}" \
-iin="${ldl_custom}" \
-iin="${ready_snp_list}" \
-iin="${pred_folder}/fit_snps_chr${i}_pred.list" \
-iin="${pred_folder}/loco.zip" \
-iin="${path_to_500kwes_helper_files}/ukb23158_500k_OQFE.sets.txt.gz" \
-iin="${path_to_500kwes_helper_files}/ukb23158_500k_OQFE.annotations.txt.gz" \
-icmd="unzip loco.zip ; ${run_regenie_burden}" --tag="burden_nmr1" --instance-type "mem2_ssd1_v2_x96" \
--destination="${project}:${data_out1}" --brief --yes
done
It gives me expected results. The results file contains ALLELE0 and ALLELE1 columns but the values are not alleles (not A<C<G<T) (attached). It also contains BETA and SE columns but in many cases they are empty. Am I doing everything right? Do you have any suggestions to improve the code? Please help me.
Comments
6 comments
@Anastazie Sedlakova?, do you please have any ideas on this topic?
> The results file contains ALLELE0 and ALLELE1 columns but the values are not alleles (not A<C<G<T) (attached).
Because you are doing burden test, it creates masks by joining variants, therefore you will not see alleles in the results tables but mask that was used, e.g. M2 singleton
> It also contains BETA and SE columns but in many cases they are empty.
By looking at your file, I see that BETA and SE are NA for ADD-ACATO, ADD-ACATV, ADD-SKAT, ADD-SKATO. It seems that regenie does not produce effect size for join variant tests, as is noted here.
Thank you so much. For the same reason, the A1FREQ (allele frequency) is also empty?
Yes, I think so.
Hi Anastazie,
Thank you. I have another question.
The first two columns of the output are "CHR" and "GENPOS". What does the "GENPOS" indicate, the position of the variant or gene? If it indicates the position of the variant, then it is ok. If it indicates the position of the gene, then what does it mean (start/end/etc)?
Best, Zillur
I did not find it explicitly in the regenie's documentation, but I m assuming that GENPOS is pulled from the set file (ukb23158_500k_OQFE.sets.txt.gz), where position in my opinion is defined by the position of the first variant.
Please sign in to leave a comment.