Data types

Permanently deleted user
How can I get information regarding the correct data type of each column in a dataset (e.g. integer, float, string, etc.)?

Comments

3 comments

  • Comment author
    Ondrej Klempir DNAnexus Team

    In order to get this info, I normally go to Cohort Browser and check details about each column, e.g.:

     

    Screenshot 2022-11-23 at 14.30.13It also contains a link to UKB Showcase, where this info is available as well.

     

    https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=12143

    0
  • Comment author
    Permanently deleted user

    Thanks, I am interested in a way to automate this for a potentially large number of columns. Ideally, I would need a per field summary table. Does anything like that exist?

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    I believe you should be able to get this programatically with inspiration from this notebook:

    https://github.com/dnanexus/OpenBio/blob/master/dxdata/getting_started_with_dxdata.ipynb

     

    • "Field - Represents all other DataDictionary columns not covered by Entity and Edge
      • name (str): Field's internal name.
      • type (str): Primitive type name, e.g. "integer", "string", "date" ...
      • table_name (str): Database table where this field's data values are stored.
      • column_name (str): Database column where field values are stored.
      • coding (Coding): Coding instance that applies to this field's values.
      • is_multi_select (bool): Whether the field can contain multiple values per cell (array/set type)
      • is_sparse_coding (bool): Whether all data values should be covered by codings.
      • title (str): Optional.
      • description (str): Optional.
      • units (str): Optional.
      • concept (str): Optional.
      • linkout (str): Optional.
      • ** kwargs (dict of strings): Arbitrary additional attributes with string values.

     

    Do not forget to run Spark based JupyterLab.

    0

Please sign in to leave a comment.