Practical Cheminformatics with Marimo

7 minute read

Published: January 17, 2026

Introduction
As someone working with drug discovery data, I frequently need a view that provides an overview and then quickly switches to a detailed perspective. I’ve long been frustrated by most software tools’ inability to provide this kind of view. Most systems for processing drug discovery data adopt a spreadsheet view that places compounds in rows and assays in columns. While spreadsheet views can be powerful, they often don’t enable users to quickly switch between summary and detailed views, which are critical for understanding SAR.

Over the last few months, I’ve been using marimo to create summary and detail views that help me quickly explore datasets and understand SAR. I’ve also created a GitHub repository with a few examples of how marimo can be used to create interactive cheminformatics tools. For those who haven’t read my previous post, marimo can be thought of as a “better Jupyter notebook”. It has all the features of Jupyter, plus several additions that make it easy to integrate interactivity into the notebook. There are three key features that distinguish marimo from Jupyter.

1. Reactive Execution Model

In Jupyter, cells are executed sequentially and manually; if you change a variable in “Cell A,” you must re-run “Cell B” to see the updated result. This often leads to “hidden state” bugs.

Marimo uses a directed acyclic graph (DAG) to track dependencies between cells.

Automatic Updates: If you modify a variable in one cell, marimo automatically re-runs all other cells that reference that variable.
Consistency: This ensures that the code you see on the screen always matches the output displayed, eliminating the “out-of-order execution” problem common in Jupyter.

2. Pure Python File Format (.py vs .ipynb)

Jupyter notebooks are stored as JSON files (.ipynb), which include code, metadata, and raw output (like large image strings). This makes them notoriously difficult to manage with version control.

Marimo notebooks are stored as pure Python files (.py).

Git-Friendly: Because they are plain text, you can use standard git diff to see exactly what code changed without wading through messy JSON.
Execution as Scripts: You can run a marimo notebook directly from your terminal using python notebook.py. It treats the notebook as a legitimate Python script, making it easier to integrate into production pipelines.

3. Native UI Interactivity (No Callbacks)

While Jupyter requires external libraries (like ipywidgets) and complex callback functions to create interactive sliders or buttons, marimo builds this directly into the Python-variable relationship.

Variables as Widgets: In marimo, UI elements like sliders or dropdowns are bound directly to Python variables.
Instant Apps: Moving a slider automatically triggers the reactive engine to update any dependent code. This allows you to turn a notebook into a functional web app or dashboard (using marimo run) without writing any specialized “app” code or using frameworks like Streamlit.

For more information on marimo, I recommend checking out these resources.

· The marimo website has a ton of great information and tutorials.
· The marimo YouTube channel has dozens of great videos where Vincent does everything from providing an overview to deep dives on specific features.

In this post, I’ll provide a quick overview of three simple but effective marimo notebooks I’ve recently created. Hopefully, this will give a sense of what you can do with marimo. In addition, I’ll highlight a few useful features in marimo-chem-utils, a pip-installable, open-source package I built to add cheminformatics features to marimo. The GitHub repository for marimo-chem-utils also includes links to demo versions of the notebooks that run on molab, marimo’s answer to Google Colab. You can run the notebooks on molab without installing any software on your computer. However, to run the notebooks in interactive mode, you must be logged in to molab. This is easy and just requires setting up a free account.

Viewing Clustering Results

Clustering provides an efficient way to organize sets of chemical structures. Although several useful clustering algorithms, such as Butina and BitBirch, are available as open-source software, I haven’t found many tools for visualizing clustering results. Fortunately, it’s easy to integrate this capability into a marimo notebook. In the examples directory of the marimo-chem-utils repository, there is a file called cluster.py that demonstrates how marimo can provide an overview/detail view of clustering results. The figure below shows the core of the interface. On the left is an overview table that shows one representative from each cluster. Clicking the checkbox to the left of a table row displays a grid of molecules from the selected cluster in the panel on the right. The great thing about marimo is that it lets you generate this view with just a few lines of code. The pandas dataframe containing the cluster representatives is passed to a marimo table object, which enables selection. Note that the marimo table view also automatically provides pagination. The structures on the right are shown using a grid class from marimo-chem-utils. This grid is a simple, lightweight wrapper around some code from the RDKit.

Functional Group Filtering

I’ve written a few posts in the past describing how sets of SMARTS patterns, such as the REOS filters, can be used to filter sets of chemical structures and identify compounds containing functionality that might be reactive, toxic, or problematic in assays. When running these filters, it’s often useful to review the results to make sure you’re not removing compounds you might want to keep. The script reos.py in the marimo-chem-utils examples directory uses a view similar to the one described above to enable a quick review of filtering results. In this case, we have a summary view on the left that shows the functional group filters, and a detail view on the right that shows the structures that triggered the filter, with the offending functionality highlighted. The detail view uses the same grid view we used above to show the structures. In this case, we make use of an additional argument specifying a SMARTS pattern to highlight in the structures.

Scatterplots with Chemical Structures

When evaluating data or examining model results, we often create a scatter plot to visualize the relationship between two properties. In other cases, we may project a molecular representation, such as chemical fingerprints, onto a two-dimensional plot using techniques like TSNE, UMAP, or GTM to provide an overview of chemical space. In either case, it’s useful to quickly select a point or a set of points and view the associated chemical structures. The third example notebook, scatterplot.py, provides two ways for marimo to associate chemical structures with a scatterplot.

· Tooltips – in the example, a tooltip showing a chemical structure is displayed when the user hovers the mouse pointer over a point in the plot.
· Selection – When a set of points is selected in the plot, the corresponding chemical structures are shown in the grid on the right. This grid view uses the same function from marimo-chem-utils as above.

Conclusions

The three notebooks described here provide a brief overview of a few ways marimo notebooks can be integrated into cheminformatics workflows. There’s a lot more you can do, and the marimo team is constantly adding new capabilities. Over the next few months, I’ll add more capabilities and examples to marimo-chem-utils. Stay tuned. It’s going to be fun.

Acknowledgments

This work has been a collaboration with Shivam Patel at PsiThera. We would like to thank the marimo team for an excellent tool and rapid support. We can’t wait to see what’s next for marimo.

Share on

Twitter Facebook LinkedIn

AI in Drug Discovery - Please Stop Fishing in the Bathtub!

9 minute read

Published: February 07, 2026

The Hype Cycle and the Reality of Virtual Screening

Working on machine learning in drug discovery can be profoundly frustrating. Every week, the purveyors of social media hype herald a new “revolution” in AI for the field. As someone deeply invested in this space, I often follow the trail to these revolutionary papers, only to find researchers repeating the same fundamental mistakes that have plagued the discipline for more than a decade. Read more

Can Gemini Search the ChEMBL Database?

6 minute read

Published: January 09, 2026

Introduction
I’ve been an enthusiastic user of the ChEMBL database for more than a decade. ChEMBL contains a treasure trove of information curated from medicinal chemistry journals. When I’m building an initial machine learning model or analyzing the SAR around a particular target, ChEMBL is the first place I go. I’m reasonably good at SQL, but I still spend a lot of time figuring out which data is in which tables and how to join them correctly to get the information I need. For years, I’ve longed for a more straightforward way to query ChEMBL. I recently discovered that the Gemini CLI from Google can write SQL queries for ChEMBL and extract the information I need into a CSV file. Is this my dream come true? Can I stop spending time staring at the ChEMBL schema and just pose a simple natural-language query like this?

> get the smiles,chembl_id, target_name, publication year, article doi,
and IC50 for all kinase inhibitors published after 2022
and write this into a file called kinase_inhibitors_after_2022.csv

Read more

Performing Exploratory Data Analysis on the OpenADMET ExpansionRx Blind Challenge Dataset

3 minute read

Published: November 08, 2025

When working with a new dataset, many people quickly jump into building a machine learning model. I prefer to start with exploratory data analysis (EDA) to gain a deeper understanding of the data. To address this need, I created a notebook that performs initial EDA on the OpenADMET ExpansionRx Blind Challenge Dataset. Instead of using Jupyter for this analysis, I’m using marimo, a new open source data science notebook environment that enables the creation of interactive data apps with minimal code. I think of marimo as a “better Jupyter” because it offers several features that simplify building interactive data apps, including built-in support for Altair charts, an enhanced table view, and interactive widgets. I’m working on a repository titled “Practical Cheminformatics with marimo,” which demonstrates some ways to use marimo for cheminformatics tasks. This code should be ready in a couple of weeks. Please consider this notebook a preview of what’s to come. For those interested in learning more about marimo, I recommend starting with the following resources. Read more

We Still Haven’t Found What We’re Looking For - The Continuing Evolution of Protein-Ligand Co-Folding Methods

6 minute read

Published: November 03, 2025

More Every Day Last week’s NVIDIA GPU Technology Conference (GTC) featured two announcements that highlighted both the potential and ongoing challenges of protein-ligand co-folding. The recently renamed Genesis Molecular AI announced PEARL, a proprietary co-folding method. Additionally, the OpenFold consortium released the code and model weights for a preview of OpenFold3 (OF3p). Both groups also provided technical reports with initial benchmarks. Besides co-folding, the OpenFold team also shared structure prediction results for protein monomers and complexes, as well as antibody-antigen complexes and RNA monomers. Read more

Pat Walters

Practical Cheminformatics with Marimo

Share on

You May Also Enjoy

AI in Drug Discovery - Please Stop Fishing in the Bathtub!

The Hype Cycle and the Reality of Virtual Screening

Can Gemini Search the ChEMBL Database?

Performing Exploratory Data Analysis on the OpenADMET ExpansionRx Blind Challenge Dataset

We Still Haven’t Found What We’re Looking For - The Continuing Evolution of Protein-Ligand Co-Folding Methods