?missing data versus nothing to report
How can I tell if volunteers skipped a question for example self-reported illnesses or if they just had no illnesses to report? I have the same question for the icd-10 data, how do I know if that question was missed or the volunteer just had no data to input as they had no recorded illnesses.
Thank you
Comments
1 comment
Hi Louise,
In general, the existence of missing/NA values might have different reasons in different fields. The first thing to do is to check the Showcase metadata about the particular field. There is a direct link within the field selection page in the Cohort Browser, and Showcase Search is here https://biobank.ndph.ox.ac.uk/showcase/search.cgi .
There might be information about NA or null values or other field-specific codes meaning “no data” within the Showcase Notes tab for the field. In some cases the Category Description might be relevant. To understand the “linked” data that has been received from the NHS Hospitals or GPs, or from the Death or Cancer Registries, it is necessary to read all the associated Resources for the field, and the Resources mentioned in all the associated Categories.
The Categories are in a hierarchy, and in a few cases there is important information in the Description for a higher Category. This is particularly likely for the genetic data and for the “linked” data. By the way, the Category hierarchy generally matches up quite well with the Folder structures within the RAP, but it is not always exactly the same.
For some fields, there can be several Instances or several Arrays. Arrayed and instanced fields often have a lot of empty values. For more on Arrays and Instances, see this article https://community.ukbiobank.ac.uk/hc/en-gb/articles/15955986227357-What-is-an-instance-index .
For some fields, such as those that come from a participant's answers to an online questionnaire, some questions are only presented to participants for whom it is relevant. For example, if question A asked “Have you ever taken medicine for diabetes”, and question B asked “what kind of diabetes medicine have you taken”, then question B would only be presented to participants who answered “Yes” to question A. The questionnaire flow is generally available as an associated Resource for the field in Showcase.
For fields that rely on a participant's answers to questions, it is of course always possible that the participant forgot some information, or didn't understand the question, or didn't wish to share data. It is generally not possible to detect when this has happened. In a few cases, there may be other fields that provide fuller data. For example, when a participant attended a baseline (initial) visit at some time between 2006 and 2010, they were asked to self-report past illnesses and operations. At a later date, UK Biobank received some historical data from NHS Hospitals, relating to occurrences as early as 1995, which might include some items that the participant forgot to mention.
Please post again if you need any more details that cannot be found in Showcase.
Thank you for using the Forum.
Please sign in to leave a comment.