Isn't it about time the JupyterLab instance got an overhaul?

Python 3.6? Hail 0.2.78 which is 14 months out of date? Jupyterlab 2.x?   My every attempt at updating manually ends in a broken pyspark/dxpy dependency circus.

Comments

28 comments

  • Comment author
    Chai Fungtammasan DNAnexus Team

    Thank you for the feedback. I sent this note directly to the lead product manager. I will leave this thread open in case anyone has workaround, but we will try to follow up to make sure this will be officially updated.

    0
  • Comment author
    Former User of DNAx Community_51

    Hi @Jakob Madsen?  we have got the latest hail version working on a spark cluster in a DNAnexus app. It would be very helpful if you or anyone else reading this would be able to test that our app (https://github.com/lindgrengroup/hail-on-dnanexus) works in dx projects other than our own. What ever the result please report below and hopefully I can make a main post announcing this app!

     

    Many Thanks,

    Barney

    0
  • The current version of Hail is incompatible with the UKB WES BGEN compression format (zlib vs zstd discussed here: https://community.dnanexus.com/s/question/0D5t000004AflSiCAJ/hail-troubleshooting-for-ukb-data). This will be fixed by this PR (https://github.com/hail-is/hail/pull/12576), which I was told should be released in day or two. It would be really helpful to get an update to the newest version once that's added.

    0
  • zstd support will be included in Hail 0.2.108

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    Do you have the estimate of when this will be released?

     

    0
  • Not sure, I'm following here https://github.com/hail-is/hail/pull/12591

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    sounds good. Thanks!

    0
  • Nice am following that release thread and will update the app asap after the hail release

    0
  • @Chai Fungtammasan? Crazy suggestion. Would it make sense for you guys to reach out to Tim and Dan from Hail? From a brief search, it seems that ~40 topics here are about Hail and ~50 questions on the Hail forums are about UKB and RAP. Seems like it might be a way to address some of the issues and weird behaviour (like the one mentioned above).

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    I think this is a good suggestion. I will sync this internally, and try to reach out to them.

    0
  • Hail 0.2.108 released and working! This is great news - previously we've been unable to analyse UKB imputed in Hail without errors in JupyterLab...

    0
  • Does anyone know at DNAnexus know if supporting a rolling release of Hail in JupyterLab is in the works? We've shown it's possible and we would prefer not to spend this time maintaining usability for your platform.

     

    Many Thanks,

    Barney

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    @Barney Hill?  The DNAnexus engineer is updating the Hail, but we might as well update to this version rather than the previous version.

    @Ondrej Klempir? has been testing your applet and see if this would fix multiple issues we saw. He runs into a few sporadic issue and would also share his suggestion with you.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Hi @Barney Hill?, I have provided a couple of tests on UKB imputation BGEN. I was successful with reproducing all the components in this UKB RAP openBio notebook: https://github.com/dnanexus/OpenBio/blob/master/hail_tutorial/BGEN_import.ipynb. Before saving mt hail matrix into dnax, I applied a filtering by sample, in order to filter down the dimensions.

     

    In my experiments, Hail 0.2.108 can directly load zstd UKB BGEN:

    -- ukb22828 GRCh37 UKB imputation from genotype

    -- ukb21007 GRCh38 TOPMed

    -- ukb21008 GRCh38 GEL

     

    During the index_BGEN step, I was facing a Spark zstd (de)compression issue. Using lz4 codec made bgen indexing step working. I just added the following line to SparkSession builder:

    .config("spark.shuffle.mapStatus.compression.codec", "lz4") 

     

    Moreover, the new Hail version produces no errors when importing WGS pVCFs.

     

    -----------

    A couple of days ago I faced a setuptools python package issue during applet initialization phase, so the applet stopped working. It was fixed when we added:

    python3 -m pip install --upgrade pip

    python3 -m pip install --upgrade setuptools

     

    All of this has been resolved in your recent update to Hail 0.2.108!

    0
  • Thankyou very much,

    Yeah something went funky with the latest release of setuptools the other day: https://github.com/pypa/setuptools/issues/3772. I should also flag that I've experienced an issue with the latest hail: https://github.com/hail-is/hail/issues/12608. From the error I'm fairly sure this is on the hail side.

    0
  • Comment author
    Ondrej Klempir DNAnexus Team

    Thank you @Barney Hill?,

    The observer error "Exception in thread "map-output-dispatcher-0" Exception in thread "map-output-dispatcher-1" you are reporting at https://github.com/hail-is/hail/issues/12608, is the same message I faced. Perhaps, in the meantime, you may add .config("spark.shuffle.mapStatus.compression.codec", "lz4")  into SparkSession builder and see if this resolves the issue.

    0
  • Comment author
    Former User of DNAx Community_38

    Would it be possible to make the dxjupyterlab and dxjupyterlab_spark_cluster apps open source like many of the other apps on the platform are? Especially if we're going to rely on the community to keep things updated.

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    That's a good suggestion. Let me talk to engineering and security team if its okay with them.

    Note that they are working on updating it though, but I agree that it's not fast enough.

    0
  • Comment author
    Former User of DNAx Community_50

    @Chai Fungtammasan? Any updates on this? Or even better, an actual ETA? It has been two months at this point

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    The tentative timeline for official product update is end of March or April if there is delay.

    0
  • Comment author
    Former User of DNAx Community_38

    @Chai Fungtammasan? Going forward should we expect some sort of update schedule?

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    I can pass long this feedback to product team that the schedule of release would be useful for users. Next week, you all will get announcement of new app including Regenie app and 4-5 Nvidia apps.

    It's a bit challenging to keep up with schedule of product release in practice though. When my team found major usability issues in Alpha testing, we have to send them back to be fixed before release. Jupyter lab is complex because it has tons of libraries inside it.

    I am trying to push on open source idea, so maybe one of my team could just update some libraries and share without having to worry about breaking change for all possible use case. This need to go through security review, so it would take sometime.

    0
  • @Chai Fungtammasan? Any updates on this?

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    The update of official Jupyter lab with Hail on DNAnexus is postponed to around end of Q2.

    0
  • I've gotten to the point where I'm not relying on this app, but it really is the path of least resistance for getting developers started with the RAP. I really don't think it should be being neglected support wise like this if the goal of the platform is to expand access.

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    Thanks for the honest feedback, Ross. I understand that this is frustrating. I will share the concern with the internal team.

    I can't really talk much about company work, but I can tell that we didn't put this on back stove. We got several complaints with previous related Jupyter lab + Hail with API call load, some libraries aren't working, and resource management. The engineering team has been working on these to improve scalability, do proper benchmark of resource requirement, improve documentation, and expand the test. My team also involved in building test case and reviewing documentation.

     

    Meanwhile, I think the community developed tools are the best option. We will make the official announcement when new Jupyter lab come out, and I will also post in this thread, so you all would get notification in case you didn't sign up for newsletter.

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    @Ross DeVito? @Jakob Madsen? @Barney Hill? We want to give you the update that the updated version of Jupyterlab should come out in July. This would include version update for Hail and all the major libraries underneath. It passed the test, but loading time is too long, so we are working on it. There will be an official announcement come out including documentation.

    0
  • Comment author
    Chai Fungtammasan DNAnexus Team

    The update happened at the end of last month. Just want to let you all know in case you want to try out new libraries.

    0

Please sign in to leave a comment.