How to store large genotype/phenotype data on the RAP that will be queried within custom code?

We have to annotate all UKB WES and WGS variants with VEP and various other features such as pathogenicity scores from dbnsfp and organize the annotations and carrier information in a database that can be queried quickly within our custom code (python and R) that will be run on the RAP. Similarly, we want to process all the UKB phenotypic data (e.g. correcting for covariates) and store this in a database format that can be queried quickly within our custom code. We've been using sqlite files so far to store variant annotations and carrier info, but this will likely not be suitable for the WGS data. What is the best way to store large data like that on the RAP so that it can be queried quickly?

Comments

1 comment

Please sign in to leave a comment.