Questions/Error: ukb-rap-pheno-basic.ipynb
Hello, I have always used this notebook (ukb-rap-pheno-basic.ipynb) to extract variables from my UKBB project.
Today when I try to run the exact same script, when I get to finding field names for a given id, this error occurs.
#Age when attending assessment centre has multiple instances (visits):
field_names_for_id('21003')
This error occurs:
/tmp/ipykernel_631/339987039.py:7: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. return sorted(fields, key=lambda f: LooseVersion(f.name))
Then when I try to retrieve the fields i get this error:
df = participant.retrieve_fields(names=field_names, engine=dxdata.connect())
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/cluster/dnax/jars/dnanexus-api-0.1.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/cluster/spark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Can you please advise?
Thank you,
Alyssa
Comments
4 comments
Hi Alyssa,
does the script fail, or does it continue after the warnings?
Does the same thing happen if you extract a different variable?
It fails to retrieve any variables, and also doesn't allow me to move on to export the created dataset out.
I just ran code from script ukb-rap-pheno-basic.ipynb in an instance with a spark cluster.
It generated several warnings, but at the end of it I did manage to save a tsv file with the data (for fields 31, 21022, 41262, 50, 20047).
The step to initialize spark generated the SLF4J warnings.
The step field_names_for_id generated "/tmp/ipykernel_595/3738977934.py:5: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. return sorted(fields, key=lambda f: LooseVersion(f.name))"
The step participant.retrieve_fields said 2023-08-12 15:20:51.418 WARN ShellBasedUnixGroupsMapping:210 - unable to return groups for user vKvF2gygkj6b6q9zxBx7Pz4bfVyFPYJGPY7kpx0y__project-G8p6vGjJz7Gq8bZ053jkK6VB
PartialGroupNameException The user name 'vKvF2gygkj6b6q9zxBx7Pz4bfVyFPYJGPY7kpx0y__project-G8p6vGjJz7Gq8bZ053jkK6VB' is not found. id: ?vKvF2gygkj6b6q9zxBx7Pz4bfVyFPYJGPY7kpx0y__project-G8p6vGjJz7Gq8bZ053jkK6VB?: no such user
id: ?vKvF2gygkj6b6q9zxBx7Pz4bfVyFPYJGPY7kpx0y__project-G8p6vGjJz7Gq8bZ053jkK6VB?: no such user
at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.resolvePartialGroupNames(ShellBasedUnixGroupsMapping.java:294)
etc
It may be that something else was also failing. Could you try again with a new instance, ignore the warnings above, and see whether you can access data.
Hi Rachel, apologies for the delay. You're right, it still produces a file, regardless of the warnings.
Thank you for your help.
Alyssa
Please sign in to leave a comment.