Performing Exploratory Data Analysis on the OpenADMET ExpansionRx Blind Challenge Dataset

3 minute read

Published: November 08, 2025

When working with a new dataset, many people quickly jump into building a machine learning model. I prefer to start with exploratory data analysis (EDA) to gain a deeper understanding of the data. To address this need, I created a notebook that performs initial EDA on the OpenADMET ExpansionRx Blind Challenge Dataset. Instead of using Jupyter for this analysis, I’m using marimo, a new open source data science notebook environment that enables the creation of interactive data apps with minimal code. I think of marimo as a “better Jupyter” because it offers several features that simplify building interactive data apps, including built-in support for Altair charts, an enhanced table view, and interactive widgets. I’m working on a repository titled “Practical Cheminformatics with marimo,” which demonstrates some ways to use marimo for cheminformatics tasks. This code should be ready in a couple of weeks. Please consider this notebook a preview of what’s to come. For those interested in learning more about marimo, I recommend starting with the following resources.

Marimo Documentation: https://marimo.readthedocs.io/en/latest/
Marimo YouTube Channel: https://www.youtube.com/c/MarimoDataScience

There are a few aspects of marimo that might confuse some long-time Jupyter users (including me):

The output of a code cell in marimo appears above the cell in the notebook instead of below it, as in Jupyter.
A marimo notebook is reactive, meaning that when you change the value of a variable, any code cells that depend on that variable will automatically update. This differs from Jupyter, where you need to manually re-run code cells to see the updated output. It also means that a variable can only be defined once in a marimo notebook.
When you run a marimo notebook, it first checks if you have the necessary libraries installed. If not, marimo will ask if you’d like to install them and will install them for you. This makes it easy to share marimo notebooks with others without worrying about dependencies. This notebook has the dependencies inlined; if you run it using the --sandbox flag, marimo will create a sandboxed environment and automatically install the dependencies.
Once the notebook loads you can run the cells by clicking on the run button associated with each shell or hitting shift-return (just like Jupyter). You can also click on the yellow run button in the bottom right to run the whole notebook.

Using the marimo notebook is easy, just follow these simple steps.

1. Download the notebook from GitHub. Note that marimo notebooks are simply Python files with a .py extension. You can download the file from the command line with this command.

wget https://raw.githubusercontent.com/PatWalters/practical_cheminformatics_posts/refs/heads/main/expansion_data_exploration/openadmet_expansion_exploration.py

2. Install marimo and uv using the following command:


pip install uv marimo

3. Use the marimo command to run the notebook. This command installs all the dependencies and launches the marimo notebook in a sandboxed environment.


marimo edit openadmet_expansion_exploration.py --sandbox

4. Enjoy!

You can also try out the notebook on molab, marimo’s answer to Google Colab.

Go to https://molab.marimo.io and sign up for a free account.
To run the notebook on molab click here

Where’s the Code?

The code and notebook for this post can be found in this GitHub repo https://github.com/PatWalters/practical_cheminformatics_posts/tree/main/expansion_data_exploration

Acknowledgements

Thanks to Hugo MacDermott-Opeskin for testing the notebook. I wouldn’t have tried marimo if it weren’t for blogs by Eric Ma and Srijit Seal. Those guys are a constant source of inspiration. Thanks to Ramon Miranda-Quintana, Ignacio Pickering, and Kenneth Lopez Perez for their help with the BitBIRCH and BBLean clustering methods I used in the notebook. Special thanks to the marimo team. They’ve created an amazing tool, their support is fantastic, and Vincent’s videos are the best!

Share on

Twitter Facebook LinkedIn

Pat Walters

Performing Exploratory Data Analysis on the OpenADMET ExpansionRx Blind Challenge Dataset

Where’s the Code?

Acknowledgements

Share on

You May Also Enjoy

We Still Haven’t Found What We’re Looking For - The Continuing Evolution of Protein-Ligand Co-Folding Methods

Just Because You Published It Doesn’t Mean It’s Right

Time For a New Adventure

Redoing the Boltz-1 Analysis of Orthosteric and Allosteric Ligand Cofolding with Boltz-2