Practical Cheminformatics Index
Published:
Unified index of posts from Blogger and GitHub Pages.
Total posts: 87
Generated on 2026-04-11.
| Date | Title | Source | Keywords | Description |
|---|---|---|---|---|
| 2026-02-07 | AI in Drug Discovery - Please Stop Fishing in the Bathtub! | GitHub | AI, Drug Discovery, Benchmarking | Critiques the misuse of the DUD-E dataset for validating machine learning models, arguing that models often “cheat” by recognizing simple chemical patterns rather than actual protein-ligand interactions. |
| 2026-01-17 | Practical Cheminformatics with Marimo | GitHub | Marimo, Jupyter, Python, Visualization | Introduces Marimo, a reactive Python notebook, as a superior alternative to Jupyter for cheminformatics with features like reactive execution and interactive UI elements. |
| 2026-01-09 | Can Gemini Search the ChEMBL Database? | GitHub | Gemini, LLM, ChEMBL, SQL | Evaluates the ability of the Gemini LLM to generate SQL queries for the ChEMBL database, noting that while it succeeds at query construction, data quality remains a limiting factor. |
| 2025-11-08 | Performing Exploratory Data Analysis on the OpenADMET ExpansionRx Blind Challenge Dataset | GitHub | EDA, OpenADMET, ADMET, Data Analysis | Provides a guide and notebook for performing EDA on the OpenADMET ExpansionRx dataset, emphasizing the importance of understanding data before modeling. |
| 2025-11-03 | We Still Haven’t Found What We’re Looking For - The Continuing Evolution of Protein-Ligand Co-Folding Methods | GitHub | Cofolding, Protein-Ligand, AlphaFold | Discusses the evolution of protein-ligand co-folding methods like PEARL and OpenFold3, noting their ongoing struggles with novel ligands and binding pockets. |
| 2025-09-20 | Just Because You Published It Doesn’t Mean It’s Right | GitHub | Reproducibility, Publication, Benchmarking | Critiques aqueous solubility modeling papers that use misleadingly “easy” benchmarks and advocates for higher quality, certified datasets. |
| 2025-09-15 | Time For a New Adventure | GitHub | Career | Announces the author’s new role as Chief Scientist at OpenADMET and outlines the initiative’s mission to improve drug metabolism and toxicity prediction through open science. |
| 2025-07-27 | Redoing the Boltz-1 Analysis of Orthosteric and Allosteric Ligand Cofolding with Boltz-2 | GitHub | Boltz-1, Boltz-2, Cofolding, Allosteric | Updates a previous analysis with the Boltz-2 model, concluding that allosteric pose prediction remains a significant challenge despite model improvements. |
| 2025-07-21 | Three Papers Demonstrating That Cofolding Still Has a Ways to Go | GitHub | Cofolding, Protein-Ligand | Summarizes three studies highlighting the limitations of current co-folding models, including their reliance on training set similarity and failure to identify allosteric sites. |
| 2025-05-18 | GNN’s can extrapolate for some properties, but there’s a trick | GitHub | GNN, Machine Learning, Extrapolation | Explains how Graph Neural Networks (GNNs) can extrapolate properties like molecular weight using “norm aggregation,” correcting a previous finding. |
| 2025-05-12 | Useful RDKit Utils - A Mötley Collection of Helpful Routines | GitHub | RDKit, Python, Utilities | Provides a collection of helpful Python routines for common cheminformatics tasks using the RDKit library. |
| 2025-05-06 | The Trouble With Tautomers | GitHub | Tautomers, RDKit, Cheminformatics | Discusses the challenges of handling tautomers in cheminformatics and demonstrates how to use RDKit for tautomer enumeration and standardization. |
| 2025-04-26 | Why Don’t Machine Learning Models Extrapolate? | GitHub | Machine Learning, Extrapolation | Explores the fundamental reasons why traditional machine learning models often fail to generalize beyond the chemical space of their training data. |
| 2025-04-26 | We’ve Moved | Blogger | General | Announces the blog’s move from Blogger to a new Markdown-based site on GitHub Pages. |
| 2025-03-07 | Even More Thoughts on ML Method Comparisons | Blogger | Machine Learning, Benchmarking | Emphasizes the need for rigorous statistical testing and better visualization when comparing machine learning models in drug discovery. |
| 2024-11-16 | Some Thoughts on Splitting Chemical Datasets | Blogger | Data Splitting, Machine Learning, Validation | Examines how different data-splitting strategies (Random, Scaffold, Butina, UMAP) affect the evaluation and generalization of machine learning models. |
| 2024-10-08 | Silly Things Large Language Models Do With Molecules | Blogger | LLM, SMILES, Cheminformatics | Critiques the use of general-purpose LLMs for molecule generation, showing they often rely on simple string manipulation rather than chemical understanding. |
| 2024-09-30 | Digging Deeper into Thompson Sampling - A Guest Blog Post by Patrick Riley | Blogger | Thompson Sampling, Active Learning, Optimization | Analyzes why certain search tasks, like docking, are significantly more difficult to optimize using Thompson Sampling than others. |
| 2024-05-22 | Generative Molecular Design Isn’t As Easy As People Make It Look | Blogger | Generative Models, Molecular Design | Discusses the extensive “child-proofing” and post-processing required to make generative AI models produce physically valid and chemically stable molecules. |
| 2024-01-16 | AI in Drug Discovery - A Highly Opinionated Literature Review (Part III) | Blogger | AI, Drug Discovery, Literature Review | Provides a curated list of review articles from 2023 covering property prediction, docking, and multi-objective optimization. |
| 2024-01-08 | AI in Drug Discovery - A Highly Opinionated Literature Review (Part II) | Blogger | AI, Drug Discovery, Literature Review | Reviews 2023 developments in LLMs, Active Learning, and Explainable AI within the context of drug discovery. |
| 2024-01-02 | AI in Drug Discovery 2023 - A Highly Opinionated Literature Review (Part I) | Blogger | AI, Drug Discovery, Literature Review | Focuses on 2023 advancements and critiques in docking, AlphaFold2, and benchmarking methodologies. |
| 2023-12-07 | Some Thoughts on Biotech vs Pharma for Computational Chemists | Blogger | Career, Biotech, Pharma | Compares career paths in biotech and pharma, discussing mentorship, software access, and the challenges of managing AI hype. |
| 2023-11-27 | Comparing Classification Models - You’re Probably Doing It Wrong | Blogger | Machine Learning, Classification, Metrics | Critiques common mistakes in comparing classification models and suggests more robust metrics and validation strategies. |
| 2023-08-03 | We Need Better Benchmarks for Machine Learning in Drug Discovery | Blogger | Benchmarking, Machine Learning, Drug Discovery | Advocates for the development of more realistic and challenging benchmarks that reflect the complexities of actual drug discovery projects. |
| 2023-07-05 | A Simple Tool for Exploring Structural Alerts | Blogger | Structural Alerts, Toxicity, Screening | Introduces a tool for identifying, visualizing, and understanding structural alerts to aid in toxicity and screening assessments. |
| 2023-06-12 | Getting Real with Molecular Property Prediction | Blogger | Property Prediction, Machine Learning | Discusses the practical challenges of predicting molecular properties and the importance of data quality and model interpretability. |
| 2023-05-02 | Using Counterfactuals to Understand Machine Learning Models | Blogger | XAI, Counterfactuals, Machine Learning | Explores the use of counterfactual explanations to gain insights into how machine learning models make decisions about molecules. |
| 2023-04-10 | Build a QSAR Model in 8 Lines of Python | Blogger | QSAR, Python, Machine Learning | Demonstrates the simplicity and power of modern Python tools for building predictive QSAR models with minimal code. |
| 2023-04-03 | Getting Inside the Mind of the Medicinal Chemist with Machine Learning | Blogger | Medicinal Chemistry, Machine Learning, SAR | Investigates how machine learning can be used to capture and augment the intuition of experienced medicinal chemists. |
| 2023-03-12 | Working With Drug Data from the ChEMBL Database | Blogger | ChEMBL, SQL, Data Mining | Provides a practical guide to querying and processing drug-related data from the ChEMBL database using SQL and Python. |
| 2023-02-01 | Generative Molecular Design - We Need to Raise the Bar | Blogger | Generative Models, Molecular Design | Calls for more rigorous evaluation and higher standards for generative molecular design methods to ensure they provide real value. |
| 2023-01-03 | AI in Drug Discovery 2022 - A Highly Opinionated Literature Review | Blogger | AI, Drug Discovery, Literature Review | A comprehensive review of the most significant papers and trends in AI for drug discovery published throughout 2022. |
| 2022-12-04 | Mining Ring Systems in Molecules for Fun and Profit | Blogger | Ring Systems, Data Mining, RDKit | Explores different methods for identifying and analyzing ring systems in large chemical datasets using RDKit. |
| 2022-03-28 | Clustering Fragment Screening Hits With a Self-Organizing Map | Blogger | Clustering, SOM, Fragment-based Design | Demonstrates the use of Self-Organizing Maps (SOMs) to organize and analyze results from fragment-based screening campaigns. |
| 2022-01-17 | The Solubility Forecast Index | Blogger | Solubility, Physicochemical Properties | Explains the development and utility of the Solubility Forecast Index (SFI) in early-stage drug discovery. |
| 2022-01-03 | Useful RDKit Utilities | Blogger | RDKit, Python, Utilities | A collection of practical RDKit-based utilities for common tasks like molecule cleaning, standardization, and property calculation. |
| 2021-11-30 | Picking the Highest Scoring Molecule(s) From Each Cluster | Blogger | Clustering, Selection, Virtual Screening | Provides a Python-based workflow for selecting diverse, high-scoring molecules from virtual screening results using clustering. |
| 2021-10-24 | Exploratory Data Analysis With mols2grid and Bemis-Murcko Frameworks | Blogger | EDA, mols2grid, Murcko Frameworks | Shows how to perform interactive EDA using the mols2grid library and scaffold analysis with Bemis-Murcko frameworks. |
| 2021-09-12 | Similarity Search and Some Cool Pandas Tricks | Blogger | Similarity Search, Pandas, Python | Combines chemical similarity searching with advanced Pandas techniques to efficiently process and analyze molecule data. |
| 2021-08-31 | Building a multiclass classification model | Blogger | Machine Learning, Classification, Multiclass | Walks through the process of building and evaluating a multiclass classification model for chemical properties or activities. |
| 2021-08-21 | Practical Cheminformatics - The Directory | Blogger | Directory, Index | An annotated guide and directory to the various resources and posts available on the Practical Cheminformatics blog. |
| 2021-07-27 | Viewing Clustered Chemical Structures in a Jupyter Notebook | Blogger | Visualization, Clustering, Jupyter | Demonstrates techniques for visualizing and exploring the results of chemical clustering directly within Jupyter notebooks. |
| 2021-07-07 | Automatic Analog Generation With Common R-group Replacements | Blogger | Analog Generation, R-groups, SAR | Presents a tool for automatically generating chemical analogs by performing common R-group substitutions on a lead scaffold. |
| 2021-06-03 | Assessing Interpretable Models | Blogger | XAI, Interpretable ML | Discusses methods for assessing the interpretability of machine learning models and why it matters in drug discovery. |
| 2021-03-30 | Fast Parallel Cheminformatics Workflows With Dask | Blogger | Dask, Parallel Processing, Python | Explains how to use the Dask library to parallelize and accelerate large-scale cheminformatics workflows in Python. |
| 2021-01-18 | AI in Drug Discovery 2020 - A Highly Opinionated Literature Review | Blogger | AI, Drug Discovery, Literature Review | A critical review of the key developments and publications in AI for drug discovery during the year 2020. |
| 2020-11-17 | A Highly Opinionated List of Open Source Cheminformatics Resources | Blogger | Resources, Open Source | A curated and annotated list of essential open-source software, libraries, and datasets for cheminformatics. |
| 2020-10-31 | What Do Molecules That Look LIke This Tend To Do? | Blogger | SAR, Similarity, Data Mining | Uses similarity search and data mining to identify the typical biological activities associated with specific chemical motifs. |
| 2020-10-12 | A Collection of Things I Frequently Forget How To Do With Seaborn Scatterplots | Blogger | Visualization, Seaborn, Python | A handy cheat sheet for advanced formatting and customization of scatterplots using the Seaborn library. |
| 2020-08-16 | Examining the Data From the ChEMBL SARS-CoV-2 Drug Repurposing Screens | Blogger | ChEMBL, SARS-CoV-2, Drug Repurposing | Analyzes early drug repurposing screening data for SARS-CoV-2 released by the ChEMBL database. |
| 2020-06-22 | Wicked Fast Cheminformatics with NVIDIA RAPIDS | Blogger | RAPIDS, GPU, Parallel Processing | Demonstrates how to use NVIDIA RAPIDS to significantly speed up cheminformatics tasks using GPU acceleration. |
| 2020-05-24 | Using the Structure-Activity Landscape Index (SALI) to Analyze Data From the SARS-CoV-2 MPro Screen | Blogger | SALI, SAR, SARS-CoV-2 | Applies the SALI index to identify activity cliffs and key SAR trends in SARS-CoV-2 MPro screening data. |
| 2020-05-13 | Some Thoughts on Comparing Classification Models | Blogger | Machine Learning, Classification, Metrics | Discusses the nuances of evaluating and comparing different classification algorithms for chemical datasets. |
| 2020-05-04 | Exploring the SARS-CoV-2 Main Protease (MPro) Structures | Blogger | SARS-CoV-2, MPro, Protein Structure | Provides an analysis of the first released crystal structures of the SARS-CoV-2 main protease. |
| 2020-04-27 | Positional Analogue Scanning | Blogger | SAR, Analogue Scanning, RDKit | Introduces a systematic approach for exploring the impact of substituent position on biological activity using RDKit. |
| 2020-04-11 | Adding Chemical Structures to a Recent COVID-19 Drug Repurposing Dataset | Blogger | COVID-19, Data Cleaning, SMILES | Documents the process of cleaning and augmenting a public COVID-19 dataset with proper chemical structures. |
| 2020-03-30 | Building on the Fragments From the Diamond/XChem SARS-CoV-2 Main Protease (MPro) Fragment Screen (Part II) Structure-Base Evaluation of Expanded Fragments | Blogger | Fragment-based Design, MPro, Structure-based Design | Evaluates fragment expansions for the SARS-CoV-2 main protease using structure-based modeling and docking. |
| 2020-03-25 | Building on the Fragments From the Diamond/XChem SARS-CoV-2 Main Protease (MPro) Fragment Screen (Part I) | Blogger | Fragment-based Design, MPro, SARS-CoV-2 | Discusses initial strategies for growing and merging fragment hits identified from a large-scale screen against the SARS-CoV-2 MPro. |
| 2020-03-21 | Benchmarking “One Molecular Fingerprint to Rule Them All” | Blogger | Fingerprints, Benchmarking | Benchmarks a variety of molecular fingerprints to see which perform best across a range of common cheminformatics tasks. |
| 2020-02-09 | How (Not) to Get a Job in Science - Part 2 - The Interview | Blogger | Career, Interviewing | Offers practical advice and common pitfalls to avoid during the scientific job interview process. |
| 2020-01-21 | How to (Not) Get a Job in Science | Blogger | Career, Job Search | Shares insights and tips on navigating the scientific job market, from applications to networking. |
| 2020-01-07 | Visualizing Decision Trees | Blogger | Visualization, Decision Trees, Machine Learning | Explains how to create clear and informative visualizations of decision tree models used for chemical classification. |
| 2019-11-13 | Interactive Plots with Chemical Structures | Blogger | Visualization, Interactive Plots, Bokeh | Shows how to build interactive data visualizations that display chemical structures on hover using Bokeh. |
| 2019-11-01 | Visualizing Chemical Space | Blogger | Chemical Space, Visualization, Dimensionality Reduction | Explores different dimensionality reduction techniques for visualizing the chemical space of large molecular libraries. |
| 2019-09-19 | Dissecting the Hype With Cheminformatics | Blogger | Hype, AI, Cheminformatics | Uses cheminformatics analysis to critically examine the hype surrounding AI in drug discovery. |
| 2019-07-28 | How Good Could (Should) My Models Be? | Blogger | Machine Learning, Performance, Error Bar | Discusses the theoretical and practical limits of model performance based on experimental data uncertainty. |
| 2019-06-02 | Using Reaction Transforms to Understand SAR | Blogger | SAR, Reaction Transforms, RDKit | Demonstrates the use of RDKit reaction transforms to explore and understand structure-activity relationships. |
| 2019-05-03 | Where’s the code? | Blogger | Open Source, GitHub | Discusses the importance of code sharing in science and provides links to the blog’s open-source repositories. |
| 2019-04-22 | Clustering 2.1 Million Compounds for $5 With a Little Help From Amazon & Facebook | Blogger | Clustering, AWS, Large Datasets | Shows how to perform large-scale chemical clustering affordably using cloud computing and efficient algorithms. |
| 2019-03-31 | Multiple Comparisons, Non-Parametric Statistics, and Post-Hoc Tests | Blogger | Statistics, Multiple Comparisons | A practical guide to using robust statistical methods when comparing multiple experimental groups or models. |
| 2019-03-03 | Plotting Distributions | Blogger | Visualization, Statistics, Seaborn | Explains best practices for visualizing the distributions of chemical properties and experimental data. |
| 2019-02-19 | Some Thoughts on Evaluating Predictive Models | Blogger | Machine Learning, Evaluation, Metrics | Discusses the selection of appropriate metrics for evaluating predictive models in different drug discovery contexts. |
| 2019-01-17 | My Response to Peter Kenny’s Comments on “AI in Drug Discovery - A Practical View From the Trenches” | Blogger | AI, Drug Discovery, Debate | A formal response to technical critiques regarding the author’s perspectives on AI in drug discovery. |
| 2019-01-11 | K-means Clustering | Blogger | Clustering, Machine Learning | Provides a basic introduction and implementation of K-means clustering for chemical data. |
| 2018-11-16 | AI in Drug Discovery - A Practical View From the Trenches | Blogger | AI, Drug Discovery, Implementation | Shares practical insights and lessons learned from implementing AI solutions in real-world drug discovery projects. |
| 2018-10-30 | Self-Organizing Maps - 90s Fad or Useful Tool? (Part 1) | Blogger | SOM, Clustering, Machine Learning | Re-evaluates the utility of Self-Organizing Maps (SOMs) for modern cheminformatics and data visualization tasks. |
| 2018-10-30 | Self-Organizing Maps - The Code (Part 2) | Blogger | SOM, Python, Implementation | Provides a step-by-step Python implementation of Self-Organizing Maps for chemical data analysis. |
| 2018-10-06 | My Science/Programming Journey | Blogger | Career, Personal | A personal reflection on the author’s career path at the intersection of science and programming. |
| 2018-09-29 | Assigning Bond Orders to PDB Ligands - The Easy Way | Blogger | PDB, Bond Orders, RDKit | Demonstrates an efficient method for correctly assigning bond orders to small molecules extracted from the Protein Data Bank (PDB). |
| 2018-09-24 | Some Notes From the 2018 RDKit UGM | Blogger | RDKit, Conference | Highlights and key takeaways from the 2018 RDKit User Group Meeting. |
| 2018-09-17 | A Few Updates to Free-Wilson | Blogger | Free-Wilson, SAR | Discusses modern updates and computational improvements to the classic Free-Wilson SAR analysis method. |
| 2018-09-05 | Predicting Aqueous Solubility - It’s Harder Than It Looks | Blogger | Solubility, Property Prediction | Explores the complexities and challenges of accurately predicting aqueous solubility for drug-like molecules. |
| 2018-08-20 | Scaffold Hopping? It’s Complicated | Blogger | Scaffold Hopping, SAR | Discusses the theory and practical difficulties of successful scaffold hopping in medicinal chemistry. |
| 2018-08-08 | Filtering Chemical Libraries | Blogger | Library Filtering, Drug-likeness | Reviews common rules and techniques for filtering large chemical libraries to identify drug-like leads. |
| 2018-06-08 | Cheating at Word Cookies with Python | Blogger | Python, Fun | A fun diversion demonstrating how to use Python to solve word puzzles. |
| 2018-05-30 | Free Wilson Analysis | Blogger | Free-Wilson, SAR | Explains the principles and application of Free-Wilson analysis for exploring structure-activity relationships. |
