Hi, I recently noticed that some participants have been automatically removed from the UKB dataset when I updated the dataset. However, I have been working with the previous data, and I want to remove these participants from my datasets

Please how can I achieve this, as I do not want to re-select the variables again and begin to go through the process of QCs prior to analysis.

Comments

12 comments

  • Comment author
    Ondrej Klempir DNAnexus Team

    Hi @Chinonso Odebeatu?,

     

    There is no automatic way that would work for all datatypes, but if you let community know what format you are working on, we could possibly recommend command to remove specific set of individual.

    0
  • I am working with phenotype data. Any idea how I can remove the participants? Initially when I downloaded the data, the participants were 502376 but when I did update recently the number has reduced by 8 participants. I don't know how I can remove these persons from the original data set.

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    It would probably be easiest to filter after you export all fields you need. Are you a Python, R, or bash coder? If so, you can use JupyterLab, R, or Swiss-army-knife to remove such participants.

    0
  • Hi @Chai Fungtammasan? thank you for your reply. I use R. And the issue is that I have already extracted the variables I needed and have done quality control, data management and preliminary analysis on these. I want to get the eid of the participants that have withdrawn their consent so I can filter them out but the challenge is accessing these eid. Any suggestions?

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    I think filter them out using basic filtering in R dataframe would be the most straight forward.

    0
  • Thank you so much @Chai Fungtammasan? . Of course I can filter it in R, I'm only asking how I can get the ID of these participants from UKB and then filter them out in the original download

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    I see. My bad, I misunderstood the question.

    I look into it and it's more complex than one might think since the EID are changed from application to application. If you aren't sure which EID are removed, You may have to collect all EID from new data and use that to filter that in old data. Currently, the platform only remove withdrawal participant at data dispense time. It doesn't have capability to remove those data once it has been dispensed.

    0
  • Thank you so much @[Chai Fungtammasan] I will as advised. ?

    ?

    ?

    ?

    Regards?

    0
  • Comment author
    Lucy BG The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi @Chinonso Odebeatu? , these will be participants who have withdrawn since you last dispensed your data. All researchers were very recently emailed a list of participants who have withdrawn for their applications, so you can use the list from your email to filter your RAP data.

    0
  • Hi @Lucy B-G? Thank you for your reply. Apparently, I didn't get the update on participants that were withdrawn from my own dataset. Do I have to email Biobank about this?

    0
  • Comment author
    Permanently deleted user

    Hi, is it correct to say that after some participants have withdrawn, if a project's data are refreshed those same participants will not be any more included in some fields but may still be in others? If what I wrote here is true, is there a way to know which fields are being updated with the withdrawals and which ones are not without manually (or programmatically) checking each field?

     

     

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Andrew, for the fields in the main apollo database, where a participant is withdrawn, they will no longer be in the database at all, so there will be no data for them in any of those fields.

    The bulk fields are more complicated, but I believe in most cases you will find that the files associated with a withdrawn participant will have been removed.

    For the population genetics data, I think you will find that withdrawn participants have had their EIDs converted to negative values.

     

    It shouldn't be necessary to check the re-dispensed data. The reason you get sent a list is so that you can remove the withdrawn participants from any data files that you have already extracted from the main database (or from bulk files). For example, if you created a CSV file with a selected Cohort using the Cohort Browser and the Table Exporter, that CSV file will not be automatically updated during a re-dispense.

    0

Please sign in to leave a comment.