Hi! Question: How to retrieve all fields (from phenotypic data) for a specific sample or list of samples (provided a file for example)?

Former User of DNAx Community_85

08 February 2022 00:00
12 comments

Comments

12 comments

Ben Busby DNAnexus Team
- 08 February 2022 19:32
Hi Catarina! let me take a look!

0
Ben Busby DNAnexus Team
- 08 February 2022 19:34
I dont see the file. Can you tell me which IDs you are using?

0
Ben Busby DNAnexus Team
- 08 February 2022 19:36
Aha, I think this should answer your question, please lmk if it doesn't: https://github.com/dnanexus/OpenBio/blob/master/UKB_notebooks/ukb-rap-pheno-basic.ipynb

0
Former User of DNAx Community_85
- 08 February 2022 19:39
Participant IDs - They will be specific to my project and I don't think I should share them(?)
I have checked that link but I couldn't retrieve all the data fields for a specific participant (or group of participants)

0
Ben Busby DNAnexus Team
- 08 February 2022 19:41
agreed, dont share the IDs here
I just needed to know which IDs

0
Ben Busby DNAnexus Team
- 08 February 2022 19:43
are you certain all of the fields you wanted were selected in showcase?
If you just got access to them, you may have to redispense

doc on that for completeness: https://dnanexus.gitbook.io/uk-biobank-rap/getting-started/creating-a-project

0
Ben Busby DNAnexus Team
- 08 February 2022 19:50
This is probably obvious, but (for others) you can check by running dataset.entities

0
Former User of DNAx Community_85
- 08 February 2022 19:53
I am going to recheck the first link you sent.
My problem was in retrieving data for a particular set of participants (eids - e.g. 1234567, 12345678).

Thanks!

0
Ben Busby DNAnexus Team
- 08 February 2022 20:11
Check and lmk if the answer is not there.

Ill be back on tomorrow morning at the latest!

0
Ben Busby DNAnexus Team
- 09 February 2022 13:32
Catarina, this may also be helpful:

field_names = []
for feature in feature_list:
print(feature)
print(field_names_for_id(feature_code_mapping[feature]))
field_names+=field_names_for_id(feature_code_mapping[feature])

0
Ben Busby DNAnexus Team
- 09 February 2022 16:08
Hi! Instead of bits and pieces, my friend @Ondrej Klempir? put everything in one place, using koalas:

# after phenotypes are successfuly loaded into a Spark dataframe (https://github.com/dnanexus/OpenBio/blob/master/UKB_notebooks/ukb-rap-pheno-basic.ipynb)

list_of_eids = ["1234567", "12345678"] # it can be hardcoded or e.g. loaded from a file

import databricks.koalas as ks # import Koalas

df_phenotypes_koalas = df_phenotypes.to_koalas() # convert Spark dataframe to enable filtering in Koalas library
print(df_phenotypes_koalas.shape) # check shape before filtering

filtered_phenotypes = df_phenotypes_koalas[df_phenotypes_koalas["eid"].isin(list_of_eids)] # apply "isin" filtering
print(df_phenotypes_koalas[df_phenotypes_koalas["eid"].isin(list_of_eids)].shape) # apply "isin" filtering and check output shape

0
Former User of DNAx Community_86
- 10 June 2023 09:50
Hi all, I would be grateful for some help on this topic also. I've been trying to create a cohort of specific eids (several hundred) but having no luck. I've tried dxdata.create_cohort but when I run it, it does create a cohort in the right folder, but it always opens with an error. Can anyone give some example code for how to use dxdata.create_cohort properly?

I've seen the solution using the koalas dataframe, but doesn't that involve having to query the entire 500,000 participant dataset for your fields of interest first? Wouldn't that use a huge amount of computing power? I would be grateful for any advice! Thanks, David

0

Please sign in to leave a comment.