Data types Permanently deleted user 23 November 2022 00:00 3 comments How can I get information regarding the correct data type of each column in a dataset (e.g. integer, float, string, etc.)? Comments 3 comments Sort by Date Votes Ondrej Klempir DNAnexus Team 23 November 2022 13:35 In order to get this info, I normally go to Cohort Browser and check details about each column, e.g.: It also contains a link to UKB Showcase, where this info is available as well. https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=12143 0 Permanently deleted user 23 November 2022 15:25 Thanks, I am interested in a way to automate this for a potentially large number of columns. Ideally, I would need a per field summary table. Does anything like that exist? 0 Ondrej Klempir DNAnexus Team 23 November 2022 15:40 I believe you should be able to get this programatically with inspiration from this notebook:https://github.com/dnanexus/OpenBio/blob/master/dxdata/getting_started_with_dxdata.ipynb "Field - Represents all other DataDictionary columns not covered by Entity and Edge name (str): Field's internal name. type (str): Primitive type name, e.g. "integer", "string", "date" ... table_name (str): Database table where this field's data values are stored. column_name (str): Database column where field values are stored. coding (Coding): Coding instance that applies to this field's values. is_multi_select (bool): Whether the field can contain multiple values per cell (array/set type) is_sparse_coding (bool): Whether all data values should be covered by codings. title (str): Optional. description (str): Optional. units (str): Optional. concept (str): Optional. linkout (str): Optional. ** kwargs (dict of strings): Arbitrary additional attributes with string values. Do not forget to run Spark based JupyterLab. 0 Please sign in to leave a comment.
Comments
3 comments
In order to get this info, I normally go to Cohort Browser and check details about each column, e.g.:
https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=12143
Thanks, I am interested in a way to automate this for a potentially large number of columns. Ideally, I would need a per field summary table. Does anything like that exist?
I believe you should be able to get this programatically with inspiration from this notebook:
https://github.com/dnanexus/OpenBio/blob/master/dxdata/getting_started_with_dxdata.ipynb
Do not forget to run Spark based JupyterLab.
Please sign in to leave a comment.