Calculating sample sizes and cross tab queries

#CheadleDA UK Biobank Data Analysts
#CheadleDA UK Biobank Data Analysts The helpers that keep the community running smoothly. UKB Community team Data Analyst
  • Updated

To assist us in answering cross-tab queries effectively, here are some guidelines that can help streamline the process. 

 

A cross-tab query involves requesting counts of combinations of two or more variables to determine sample sizes and intersections of variables e.g. how many participants have a certain disease and imaging data. In the case of genetic cross-tabs, one of the variables relates to genetics, such as a single nucleotide polymorphism (SNP). Cross tabs are for the purposes of establishing feasibility of a study only. When submitting cross-tab queries please use the following guidelines: 

 

- Access to UKB-RAP: Please ensure that you do not already have access to the UKB-RAP.

- Specific SNP Information: For genetic cross-tabs, kindly provide a specific SNP in one of the following formats: 

           - Rsid 

           - Chromosome, base pair location, and allele change (e.g., Chr13-32398489-A-T) 

           - Please express chromosomal positions according to hg38 (Genome Reference Consortium Human Build 38).

- Whole exome sequencing data: Currently we can only carryout a cross-tab for SNPs or chromosomal positions found within whole exome sequencing data. In the future we hope to extend this to the Whole Genome Sequencing (WGS) data. As an alternative, the genomic search tool on UK Biobank’s showcase can be used to query the genotyping data and the allele frequency browser can be used to query the WGS data. 

- Field IDs for Other Variables: For most other variables, please provide specific field IDs. Notable exceptions include: 

              - Imaging (e.g., Heart MRI, Brain MRI) 

              - Biomarkers (e.g., How many participants with proteomics/metabolomics/blood count data have) 

              - In these cases, we will use one of the field IDs as a proxy. 

 

- Specific Answers or Ranges: If you are interested in specific answers or ranges within a field, please specify these values. For instance, if a field has responses ranging from "very often true" to "never true," please indicate the values of interest (e.g. participants answering "never true" or "rarely true"). 

 

- Specific Instance: If you are interested in numbers from a specific instance, please specify the instance. For example, baseline assessment, repeat of baseline, imaging, repeat imaging.

 

- Diagnosis and Procedure Information: If you would like information on the number of participants with a particular diagnosis or who have undergone a specific procedure, please provide: 

            - Specific field ID(s) 

            - For hospital inpatient data, cancer registry data, or death record data, specific ICD-10 code(s) 

            - For operations and procedures, specific OPCS4 code(s) 

            - For primary care data, specific read/BNF/dm+d code(s); seeresource 591 and resource 592. 

 
 

We appreciate your attention to these guidelines, as they will help ensure a more efficient process for both you and our Data Analysis team.

 

If you have any questions or need further assistance, please submit a ticket.

 

Related to

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.