Practical Cheminformatics with Marimo

7 minute read

Published:


Introduction
As someone working with drug discovery data, I frequently need a view that provides an overview and then quickly switches to a detailed perspective. I’ve long been frustrated by most software tools’ inability to provide this kind of view. Most systems for processing drug discovery data adopt a spreadsheet view that places compounds in rows and assays in columns. While spreadsheet views can be powerful, they often don’t enable users to quickly switch between summary and detailed views, which are critical for understanding SAR.

Over the last few months, I’ve been using marimo to create summary and detail views that help me quickly explore datasets and understand SAR. I’ve also created a GitHub repository with a few examples of how marimo can be used to create interactive cheminformatics tools. For those who haven’t read my previous post, marimo can be thought of as a “better Jupyter notebook”. It has all the features of Jupyter, plus several additions that make it easy to integrate interactivity into the notebook. There are three key features that distinguish marimo from Jupyter.

1. Reactive Execution Model

In Jupyter, cells are executed sequentially and manually; if you change a variable in “Cell A,” you must re-run “Cell B” to see the updated result. This often leads to “hidden state” bugs.

Marimo uses a directed acyclic graph (DAG) to track dependencies between cells.

  • Automatic Updates: If you modify a variable in one cell, marimo automatically re-runs all other cells that reference that variable.
  • Consistency: This ensures that the code you see on the screen always matches the output displayed, eliminating the “out-of-order execution” problem common in Jupyter.

2. Pure Python File Format (.py vs .ipynb)

Jupyter notebooks are stored as JSON files (.ipynb), which include code, metadata, and raw output (like large image strings). This makes them notoriously difficult to manage with version control.

Marimo notebooks are stored as pure Python files (.py).

  • Git-Friendly: Because they are plain text, you can use standard git diff to see exactly what code changed without wading through messy JSON.
  • Execution as Scripts: You can run a marimo notebook directly from your terminal using python notebook.py. It treats the notebook as a legitimate Python script, making it easier to integrate into production pipelines.

3. Native UI Interactivity (No Callbacks)

While Jupyter requires external libraries (like ipywidgets) and complex callback functions to create interactive sliders or buttons, marimo builds this directly into the Python-variable relationship.

  • Variables as Widgets: In marimo, UI elements like sliders or dropdowns are bound directly to Python variables.
  • Instant Apps: Moving a slider automatically triggers the reactive engine to update any dependent code. This allows you to turn a notebook into a functional web app or dashboard (using marimo run) without writing any specialized “app” code or using frameworks like Streamlit.

For more information on marimo, I recommend checking out these resources.

· The marimo website has a ton of great information and tutorials.
· The marimo YouTube channel has dozens of great videos where Vincent does everything from providing an overview to deep dives on specific features.

In this post, I’ll provide a quick overview of three simple but effective marimo notebooks I’ve recently created. Hopefully, this will give a sense of what you can do with marimo. In addition, I’ll highlight a few useful features in marimo-chem-utils, a pip-installable, open-source package I built to add cheminformatics features to marimo. The GitHub repository for marimo-chem-utils also includes links to demo versions of the notebooks that run on molab, marimo’s answer to Google Colab. You can run the notebooks on molab without installing any software on your computer. However, to run the notebooks in interactive mode, you must be logged in to molab. This is easy and just requires setting up a free account.

Viewing Clustering Results

Clustering provides an efficient way to organize sets of chemical structures. Although several useful clustering algorithms, such as Butina and BitBirch, are available as open-source software, I haven’t found many tools for visualizing clustering results. Fortunately, it’s easy to integrate this capability into a marimo notebook. In the examples directory of the marimo-chem-utils repository, there is a file called cluster.py that demonstrates how marimo can provide an overview/detail view of clustering results. The figure below shows the core of the interface. On the left is an overview table that shows one representative from each cluster. Clicking the checkbox to the left of a table row displays a grid of molecules from the selected cluster in the panel on the right. The great thing about marimo is that it lets you generate this view with just a few lines of code. The pandas dataframe containing the cluster representatives is passed to a marimo table object, which enables selection. Note that the marimo table view also automatically provides pagination. The structures on the right are shown using a grid class from marimo-chem-utils. This grid is a simple, lightweight wrapper around some code from the RDKit.

Functional Group Filtering

I’ve written a few posts in the past describing how sets of SMARTS patterns, such as the REOS filters, can be used to filter sets of chemical structures and identify compounds containing functionality that might be reactive, toxic, or problematic in assays. When running these filters, it’s often useful to review the results to make sure you’re not removing compounds you might want to keep. The script reos.py in the marimo-chem-utils examples directory uses a view similar to the one described above to enable a quick review of filtering results. In this case, we have a summary view on the left that shows the functional group filters, and a detail view on the right that shows the structures that triggered the filter, with the offending functionality highlighted. The detail view uses the same grid view we used above to show the structures. In this case, we make use of an additional argument specifying a SMARTS pattern to highlight in the structures.

Scatterplots with Chemical Structures

When evaluating data or examining model results, we often create a scatter plot to visualize the relationship between two properties. In other cases, we may project a molecular representation, such as chemical fingerprints, onto a two-dimensional plot using techniques like TSNE, UMAP, or GTM to provide an overview of chemical space. In either case, it’s useful to quickly select a point or a set of points and view the associated chemical structures. The third example notebook, scatterplot.py, provides two ways for marimo to associate chemical structures with a scatterplot.

· Tooltips – in the example, a tooltip showing a chemical structure is displayed when the user hovers the mouse pointer over a point in the plot.
· Selection – When a set of points is selected in the plot, the corresponding chemical structures are shown in the grid on the right. This grid view uses the same function from marimo-chem-utils as above.

Conclusions

The three notebooks described here provide a brief overview of a few ways marimo notebooks can be integrated into cheminformatics workflows. There’s a lot more you can do, and the marimo team is constantly adding new capabilities. Over the next few months, I’ll add more capabilities and examples to marimo-chem-utils. Stay tuned. It’s going to be fun.

Acknowledgments

This work has been a collaboration with Shivam Patel at PsiThera. We would like to thank the marimo team for an excellent tool and rapid support. We can’t wait to see what’s next for marimo.