Extract record tables using the dx command line.

Hello, I would like to download the record table 1062 , containing GP prescription data. I have already used the dx extract dataset command to download other fields from the database such as the sex and the birth year of the participant selecting the fields "participant.p31,participant.p34". I tried to use as entity the "GP Prescription Record" with no success. Also the columns inside the table do not contain data field names to pass them in the command line. Can you suggest me the best way to extract record tables using the dx command line? Thank you in advance.

Comments

7 comments

  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    I suggest you use dx run table-exporter, and use entity gp_scripts, with field-titles instead of field-names.

     

    You may need to install the table-exporter first, which can be done from the Tools > Tools Library tab. See https://ukbiobank.dnanexus.com/app/table-exporter

    https://documentation.dnanexus.com/developer/apps/developing-spark-apps/table-exporter-application#using-the-table-exporter-app

     

    For example, to extract all the gp_scripts columns except "Data provider", from the Apollo dataset into a csv in my main RAP project storage, I used the following command:

     

    dx run table-exporter -idataset_or_cohort_or_dashboard="app12345_20230815211802.dataset" -ioutput=rgw_scripts -ioutput_format=CSV -icoding_option=REPLACE -iheader_style=FIELD-NAME -ientity="gp_scripts" -ifield_titles="Participant ID" -ifield_titles="BNF code" -ifield_titles="Date prescription was issued" -ifield_titles="Read v2 code" -ifield_titles="dm+d code"

     

    Note that you will need to use your own dataset-record id.

    This produces a csv file that is about 2GB.

     

    The columns available in GP Prescription Record can be seen in the cohort-browser, using add-column, in folder Health related outcomes > Primary care > Record-level access.

    Alternatively, there is a description on Showcase, in Catalogues > Record Tables > 1062, tab "8 columns".

     

    So far as I can tell, there do not seem to be any field-names for these record-table columns.

    0
  • Comment author
    Former User of DNAx Community_60

    Hello Rachael,

     

    Thank you for your insights.

     

    I have tried to use the dx extract dataset command to extract specific columns using field names and field titles.

     

    For example: dx extract dataset <project-id> --field "gp_scripts.drug_name" -o GP_Prescription_Data_drug.csv

     

    It seems that it recognizes the data field.

     

    and I get an error:

    Please consider using `--sql` option to generate the SQL query and query via a private compute cluster.

    Query exceeded timeout [120]. Cancelled

     

    Is is because the data is very big to be downloaded with the dx command line?

     

    Also, regarding the table exporter tool, is the data downloaded in the RAP platform? I wish to get and use the data locally.

     

    Thanks once again.

     

     

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Kleanthi,

     

    I don't know why dx extract is not working (I have never used dx extract). Somebody else might be able to explain [??} .

     

    The table exporter command that I have written will copy the gp_scripts data from the main dataset into a new file on your project storage in the RAP. Once the csv file is in your project storage, you can use the "Download" button to copy it from there to your local computer.

     

    I just noticed that I missed the drug_name and quantity columns from my example command, but you can add them to the end of it.

     

    If this fails, you could consider requesting access to gp_scripts by putting field 42039 in an AMS basket. Once the basket is generated, you could then log into AMS and download the whole gp_scripts table from there. However, there is always a delay with basket processing, generally at least a week, and at present baskets are paused for the forthcoming showcase updates, so the delay could be more than 3 weeks.

    0
  • Comment author
    Alexandra Lee DNAnexus Team

    As Racheal mentioned, you used entity title instead of entity name when you tried to extract the data which is why our original approach didn't work. To see the list of entity names and titles you can run `dx extract_dataset <project-id:record-id> --list-entities`

     

    For ?GP prescription? data you?ll use entity name `gp_scripts`. To get all the fields in this record you can run: `dx extract_dataset <project-id:record-id> --entities "gp_scripts" --list-fields`

     

    Then to extract the data for this GP prescription you can run a command using a SPARK jupyter lab like:

     

    cmd = [

        "dx",

        "extract_dataset",

        <project-id:record-id>,

        "--fields",

        "gp_scripts.bnf_code,gp_scripts.data_provider,gp_scripts.dmd_code,gp_scripts.drug_name,gp_scripts.eid,gp_scripts.issue_date,gp_scripts.quantity,gp_scripts.read_2",

        "--delimiter",

        ",",

        "--output",

        "extracted_data.sql",

        "--sql"

    ]

    subprocess.check_call(cmd)

     

    For an example of how to run SQL query in a SPARK notebook you can see the tutorial: https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/gwas-ex#extract-data-fields-used-for-deriving-phenotype-and-sample-qc

     

     

    0
  • Comment author
    Former User of DNAx Community_60

    Dear Rachael and Alexandra,

     

    Thank you again for the insights. I used table exporter to get the desirable tables. One more question. When I have run the command dx table exporter, do I need to log out or delete or pause the app TableExporter afterwards in order not to use computational resources from UK BB?

    0
  • Comment author
    Rachael W The helpers that keep the community running smoothly. UKB Community team Data Analyst

    Hi Kleanthi, the table exporter stops when it has finished (it doesn't continue using resources).

    0
  • Comment author
    Former User of DNAx Community_60

    In the dx run table-exporter command since we want to use the field titles we use : -iheader_style=FIELD-TITLE.

    1

Please sign in to leave a comment.