What would you suggest has the best way to install a github software on the RAP and use it with the RAP data? Using a Spark Notebook? If yes, do we have to download the data from the RAP into the Spark Notebook environment every time?
What would you say is the most efficient way to handle tiny parts of multiple pVCFs for the same samples?
E.g.: I want to use specific genes to perform an analysis (and I need their variants' qualities as well) but they are split in ten different pVCFs. Is the easiest/time/cost efficient way to download all these 10 pVCFs to a Spark notebook and perform the analyses or make a smaller VCF with all genes first, outside of a Spark notebook and then load these tinier VCF to the Spark notebook?
You can use git from either a cloud workstation or a jupyter notebook. Personally, I use git from both.
If you are talking about <10 GB of data, you can use the Create Snapshot function for JupyterLab so you dont have to move the data each time.
In a similar vein, you can use the dx-snapshot function in cloud workstation and boot from there.
I'd use bedtools or vcftools in the swiss-army-knife app. [Shameless plug] I'm going to be in a webinar with regeneron and NVIDIA on 2/17 and Ill go through how to subset data using bedtools.
Hi, I just created a project with UKBB datasets. I am trying to use cohort browser, or JupyterLab to select samples, but I cannot find "Dataset" or "cohort browser" as shown in the documentation anywhere in the interface. Do I miss anything obvious here?
I think I figured it out. There is a `.dataset` file under the project directory, which should be the one to be used in JupyterLab (not tried yet). Then, "explore" -> "Add filter" etc should be the "cohort browser" that is shown in the documentation.
Hi Bo! If you select the dataset and then click on the graph icon in the upper right hand corner, it will take you to the cohort browser. Please let me know if you have any issues!
Thanks Ben, I thought I get it but maybe I did not. Right now, I have a dataset file. By either clicking the filename, or clicking "Explore Data" from the 'three-dot' menu to the right, I get in an interface with "Dashboard Actions" to the top right corner., and "add filter" in the middle using which I can filter subjects. Isn't this the "cohort browser"?
I am learning the cohort browser by adding filters. The cohort browser (although I cannot find such a name anywhere) seems to be easier to use than the JupyterLab/Spark method I saw from another tutorial but I suppose that method is more suitable for batch processing. I will continue to explore the system and ask if I get any question.
@Ben Busby? This forum may not be the best place to ask MTA-related questions, but are you aware of any restrictions on adding our own data to a UKBB project and analyze in conjunction with UKBB data? I browsed through the MTA and did not find an answer (I did not read it word by word).
Hi, @Ben Busby? What is the recommended way to annotate variants that I see from the genome section of the cohort browser? I see a number of variants that I am interested in, I would like to know if they are pathogenic (using CLINVAR or genomAD) and then find their carroers (and check carriers' phenotpes as the next step). I know that I can download the variants and run an annotator offline, but I am wondering what is the best way to do that on DNAnexus. Thanks.
Let's say I want to perform a linear or logistic regression as Phenotype ~ Genotype + Covariates.
I want to perform this task in the Rstudio/JupyterLab using R. I understood how to use the cohort browser to select a cohort, but I need the data on unrelated individuals only. I do not understand how to create a cohort of unrelated individuals. Then how to add genotype data for the same set of individuals to perform my analysis.
Please help me to understand how I can load phenotype, genotype, and covariates data to RAP's Rstudio/ JupyterLab for my analysis. I have been stuck in this for too long.
Comments
15 comments
Hi! Question: How to retrieve all fields (from phenotypic data) for a specific sample or list of samples (provided a file for example)?
Next questions:
E.g.: I want to use specific genes to perform an analysis (and I need their variants' qualities as well) but they are split in ten different pVCFs. Is the easiest/time/cost efficient way to download all these 10 pVCFs to a Spark notebook and perform the analyses or make a smaller VCF with all genes first, outside of a Spark notebook and then load these tinier VCF to the Spark notebook?
Just saw these!
I'm back, in case folks have questions!
Hi, I just created a project with UKBB datasets. I am trying to use cohort browser, or JupyterLab to select samples, but I cannot find "Dataset" or "cohort browser" as shown in the documentation anywhere in the interface. Do I miss anything obvious here?
I think I figured it out. There is a `.dataset` file under the project directory, which should be the one to be used in JupyterLab (not tried yet). Then, "explore" -> "Add filter" etc should be the "cohort browser" that is shown in the documentation.
Hi Bo! If you select the dataset and then click on the graph icon in the upper right hand corner, it will take you to the cohort browser. Please let me know if you have any issues!
Ben
Thanks Ben, I thought I get it but maybe I did not. Right now, I have a dataset file. By either clicking the filename, or clicking "Explore Data" from the 'three-dot' menu to the right, I get in an interface with "Dashboard Actions" to the top right corner., and "add filter" in the middle using which I can filter subjects. Isn't this the "cohort browser"?
Yep! Have you been able to add tiles yet? I find them helpful when Im starting to scope a data problem
Ben
I am learning the cohort browser by adding filters. The cohort browser (although I cannot find such a name anywhere) seems to be easier to use than the JupyterLab/Spark method I saw from another tutorial but I suppose that method is more suitable for batch processing. I will continue to explore the system and ask if I get any question.
@Ben Busby? This forum may not be the best place to ask MTA-related questions, but are you aware of any restrictions on adding our own data to a UKBB project and analyze in conjunction with UKBB data? I browsed through the MTA and did not find an answer (I did not read it word by word).
Awesome!
Great!
Hi, @Ben Busby? What is the recommended way to annotate variants that I see from the genome section of the cohort browser? I see a number of variants that I am interested in, I would like to know if they are pathogenic (using CLINVAR or genomAD) and then find their carroers (and check carriers' phenotpes as the next step). I know that I can download the variants and run an annotator offline, but I am wondering what is the best way to do that on DNAnexus. Thanks.
Hii Ben!
I have a question.
Let's say I want to perform a linear or logistic regression as Phenotype ~ Genotype + Covariates.
I want to perform this task in the Rstudio/JupyterLab using R. I understood how to use the cohort browser to select a cohort, but I need the data on unrelated individuals only. I do not understand how to create a cohort of unrelated individuals. Then how to add genotype data for the same set of individuals to perform my analysis.
Please help me to understand how I can load phenotype, genotype, and covariates data to RAP's Rstudio/ JupyterLab for my analysis. I have been stuck in this for too long.
Please sign in to leave a comment.