Is there a way to designate exclusions of participants whose genetic sex (DF: 22001) does not match their reported sex (DF: 31) within the cohort browser?
I assume I could do it by saying: SEX | is MALE and GENETIC SEX | is MALE, but then I would have two cohorts separated by gender? Am I missing something?
cohort = df.filter('((p22001 == 0) AND (p31 = 1)) OR ((p22001 == 1) AND (p31 = 0))')
Results can be then exported to pandas or csv or you may want to apply filter for "exclusions" using NOT.
B ) Non programming approach - I created two cohorts:
and for each of them, I exported Participant IDs using Data Preview Tab (there is a limit of 30k rows). I merged these two files into one and added a new column "inconsistent_sex". This was then input for Dataset Extender app. Dataset Extender app created a new Dataset, so I could open it in Cohort Browser --> a new field now exists, therefore I can use it as filter (EQUALS, not EQUALS etc.):
C ) Similarly, in case your goal is to just export phenotypes for the selected cohorts and do your downstream analysis outside of Cohort Browser, you can create two cohorts, save them into derived cohorts and export particular fields from these two using Table Exporter. Once this is done, you should be able to merge the two resulting files together and apply simple filters.
0
Permanently deleted user
Excellent, thank you for the in-depth response! I believe I will personally utilize option B)
I was just wondering: on the DNAnexus documentation website they highlight the ability to combine cohorts, is this not available on the RAP? Or is this an issue with my browser? (https://documentation.dnanexus.com/user/cohort-browser)
Actually, that was one of the ideas I had yesterday, but I was not sure how to make it running. It works today! Just save the two cohorts, open one of them, click Combine and add the second one plus desired set logic:
Comments
3 comments
Would such query be something like "(SEX is MALE AND GENETIC SEX is FEMALE) OR (SEX is FEMALE AND GENETIC SEX is MALE)"?
If so, for more complex filter combining ANDOR rules, there might be other ways how to extract phenotypic data. I tested these ways:
A ) I would prepare / extract such cohort by applying a filter using SQL / dxdata. Following https://github.com/dnanexus/OpenBio/blob/master/UKB_notebooks/ukb-rap-pheno-basic.ipynb, I was able to filter phenotypes in Spark-based JupyterLab:
import pyspark
import dxpy
import dxdata
sc = pyspark.SparkContext()
spark = pyspark.sql.SparkSession(sc)
dataset = dxdata.load_dataset(id=dispensed_dataset_id)
participant = dataset["participant"]
field_names = ['eid', 'p31', 'p22001']
df = participant.retrieve_fields(names=field_names, engine=dxdata.connect())
# Using SQL syntax
cohort = df.filter('((p22001 == 0) AND (p31 = 1)) OR ((p22001 == 1) AND (p31 = 0))')
Results can be then exported to pandas or csv or you may want to apply filter for "exclusions" using NOT.
B ) Non programming approach - I created two cohorts:
and for each of them, I exported Participant IDs using Data Preview Tab (there is a limit of 30k rows). I merged these two files into one and added a new column "inconsistent_sex". This was then input for Dataset Extender app. Dataset Extender app created a new Dataset, so I could open it in Cohort Browser --> a new field now exists, therefore I can use it as filter (EQUALS, not EQUALS etc.):
C ) Similarly, in case your goal is to just export phenotypes for the selected cohorts and do your downstream analysis outside of Cohort Browser, you can create two cohorts, save them into derived cohorts and export particular fields from these two using Table Exporter. Once this is done, you should be able to merge the two resulting files together and apply simple filters.
Excellent, thank you for the in-depth response! I believe I will personally utilize option B)
I was just wondering: on the DNAnexus documentation website they highlight the ability to combine cohorts, is this not available on the RAP? Or is this an issue with my browser? (https://documentation.dnanexus.com/user/cohort-browser)
Actually, that was one of the ideas I had yesterday, but I was not sure how to make it running. It works today! Just save the two cohorts, open one of them, click Combine and add the second one plus desired set logic:
Please sign in to leave a comment.