How can we programmatically check instance indexing semantics?
This page indicates that instance indexing can have a different meaning for different fields. Some follow instancing coding 2 and some do not. How do I know which fields do and which do not? The data dictionary viewable from dx extract_dataset -ddd does not seem to say. The "longitudinal_axis_type" field seems relevant, but it is blank. A concrete example of this problem is p20078_i3 and p20078_i4; if I were not careful, I would assign p20078_i3 to instance 3 happening on the date p53_i3, but there is no p53_i4, so trying to assign a date to p20078_i4 tips me off to a potential problem. Thank you for your help.
Comments
6 comments
Hi Eric,
there is a file called field.txt with columns including field_id and instance_id that relate the field number to the instancing id. This file is downloadable from Showcase (it doesn't include any participant data) at https://biobank.ndph.ox.ac.uk/showcase/schema.cgi?id=1 . So for example it relates field_id 20078 to instance_id 1.
Thank you very much! That's exactly what I needed.
People viewing this thread in the future may also want schemas 9 and 10 from https://biobank.ndph.ox.ac.uk/showcase/schema.cgi .
(Indispensably important instance indexing interpretation info is in…)
I still am seeing some confusing examples. Field 22670 “Minimum carotid IMT (intima-medial thickness) at 120 degrees” has instance id 2, so I expected it would have instance indices 0 through 3 as shown in schema 10, and the dates of the measurements would be from Field 53. But in the data dictionary, there is an entry for p22670_i4. Where should I look to find the date of measurement for p22670_i4? Thanks again and I am sorry for the novice questions; I am trying, but it will take me a long time to catch up on the necessary background reading.
Hi Eric,
that is a very good question, and you have found an error in the data. Thank you for informing us. We will investigate further and fix it.
In the meantime, please do not use the “i4” values, as I am not at all sure where they have come from.
Field 22670 is within Category 100000 Assessment centre, so I am confident that instance id 2 is correct for field 22670. From the range of the “i4” values, they look like they could be carotid intima-media thicknesses. It is possible that these “i4” participants had the measurement done twice at their i3 visit, and that the “i4” values should have been listed in an array. For 179 participants, the i4 and i3 values are the same. However, field 22670 is not currently arrayed, and the values that are not the same are not particularly close, so this doesn't quite fit. It is also possible that the i4 values have been recorded with the wrong field_id altogether.
Thank you for this rapid and thorough reply! This info is all I need, and I can leave those out for now.
Please sign in to leave a comment.