How can I create a list of all files available in a project in the RAP? Thanks!

Comments

1 comment

  • Comment author
    Anastazie Sedlakova DNAnexus Team

    Hello, you can use following code in the JupyterLab that you launched in the project of interest. I am using folder Bulk as an example.

     

    import csv

    import sys

    import dxpy

    from dxpy.utils.resolver import resolve_existing_path

     

    project, folder, _ = resolve_existing_path("Bulk")

     

    def list_folder(folder, project):

    print("Listing {}".format(folder))

    output = dxpy.api.project_list_folder(project, input_params={"folder": folder, "describe": {"fields": {"id": True, "name": True, "size": True}}})

     

    for obj in output['objects']:

    if obj['id'].startswith('file-'):

    desc = obj["describe"]

    yield [desc['id'], desc['name'], folder, str(desc['size'])]

     

    for subfolder in output['folders']:

    yield from list_folder(subfolder, project)

    with open("files.txt", "w") as f:

    writer = csv.writer(f, delimiter="\t")

    writer.writerow(["id", "name", "folder", "size"])

    for row in list_folder(folder, project):

    writer.writerow(row)

     

    Resulting table will have following format

     

    id   name  folder size

    file-ID1  1234567_20253_2_0.zip  /Bulk/Brain MRI/T2 FLAIR/10   36181516

     

    Bulk folder in my project has about 900,000 files and writing them took about 15 minutes on mem1_hdd1_v2_x16 instance.

    0

Please sign in to leave a comment.