Questions/Error: ukb-rap-pheno-basic.ipynb

Hello, I have always used this notebook (ukb-rap-pheno-basic.ipynb) to extract variables from my UKBB project.

Today when I try to run the exact same script, when I get to finding field names for a given id, this error occurs.

#Age when attending assessment centre has multiple instances (visits):

field_names_for_id('21003')

This error occurs:

/tmp/ipykernel_631/339987039.py:7: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. return sorted(fields, key=lambda f: LooseVersion(f.name))

Then when I try to retrieve the fields i get this error:

df = participant.retrieve_fields(names=field_names, engine=dxdata.connect())

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/cluster/dnax/jars/dnanexus-api-0.1.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/cluster/spark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Can you please advise?

Thank you,

Alyssa

Comments

4 comments

  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Alyssa,

    does the script fail, or does it continue after the warnings?

     

    Does the same thing happen if you extract a different variable?

     

    0
  • It fails to retrieve any variables, and also doesn't allow me to move on to export the created dataset out.

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    I just ran code from script ukb-rap-pheno-basic.ipynb in an instance with a spark cluster.

    It generated several warnings, but at the end of it I did manage to save a tsv file with the data (for fields 31, 21022, 41262, 50, 20047).

     

    The step to initialize spark generated the SLF4J warnings.

     

    The step field_names_for_id generated "/tmp/ipykernel_595/3738977934.py:5: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. return sorted(fields, key=lambda f: LooseVersion(f.name))"

     

    The step participant.retrieve_fields said 2023-08-12 15:20:51.418 WARN ShellBasedUnixGroupsMapping:210 - unable to return groups for user vKvF2gygkj6b6q9zxBx7Pz4bfVyFPYJGPY7kpx0y__project-G8p6vGjJz7Gq8bZ053jkK6VB

    PartialGroupNameException The user name 'vKvF2gygkj6b6q9zxBx7Pz4bfVyFPYJGPY7kpx0y__project-G8p6vGjJz7Gq8bZ053jkK6VB' is not found. id: ?vKvF2gygkj6b6q9zxBx7Pz4bfVyFPYJGPY7kpx0y__project-G8p6vGjJz7Gq8bZ053jkK6VB?: no such user

    id: ?vKvF2gygkj6b6q9zxBx7Pz4bfVyFPYJGPY7kpx0y__project-G8p6vGjJz7Gq8bZ053jkK6VB?: no such user

     

    at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.resolvePartialGroupNames(ShellBasedUnixGroupsMapping.java:294)

    etc

     

    It may be that something else was also failing. Could you try again with a new instance, ignore the warnings above, and see whether you can access data.

     

     

    0
  • Hi Rachel, apologies for the delay. You're right, it still produces a file, regardless of the warnings.

    Thank you for your help.

    Alyssa

    0

Please sign in to leave a comment.