Practical Cheminformatics Index

16 minute read

Published:

Unified index of posts from Blogger and GitHub Pages.

Total posts: 87

Generated on 2026-04-11.

DateTitleSourceKeywordsDescription
2026-02-07AI in Drug Discovery - Please Stop Fishing in the Bathtub!GitHubAI, Drug Discovery, BenchmarkingCritiques the misuse of the DUD-E dataset for validating machine learning models, arguing that models often “cheat” by recognizing simple chemical patterns rather than actual protein-ligand interactions.
2026-01-17Practical Cheminformatics with MarimoGitHubMarimo, Jupyter, Python, VisualizationIntroduces Marimo, a reactive Python notebook, as a superior alternative to Jupyter for cheminformatics with features like reactive execution and interactive UI elements.
2026-01-09Can Gemini Search the ChEMBL Database?GitHubGemini, LLM, ChEMBL, SQLEvaluates the ability of the Gemini LLM to generate SQL queries for the ChEMBL database, noting that while it succeeds at query construction, data quality remains a limiting factor.
2025-11-08Performing Exploratory Data Analysis on the OpenADMET ExpansionRx Blind Challenge DatasetGitHubEDA, OpenADMET, ADMET, Data AnalysisProvides a guide and notebook for performing EDA on the OpenADMET ExpansionRx dataset, emphasizing the importance of understanding data before modeling.
2025-11-03We Still Haven’t Found What We’re Looking For - The Continuing Evolution of Protein-Ligand Co-Folding MethodsGitHubCofolding, Protein-Ligand, AlphaFoldDiscusses the evolution of protein-ligand co-folding methods like PEARL and OpenFold3, noting their ongoing struggles with novel ligands and binding pockets.
2025-09-20Just Because You Published It Doesn’t Mean It’s RightGitHubReproducibility, Publication, BenchmarkingCritiques aqueous solubility modeling papers that use misleadingly “easy” benchmarks and advocates for higher quality, certified datasets.
2025-09-15Time For a New AdventureGitHubCareerAnnounces the author’s new role as Chief Scientist at OpenADMET and outlines the initiative’s mission to improve drug metabolism and toxicity prediction through open science.
2025-07-27Redoing the Boltz-1 Analysis of Orthosteric and Allosteric Ligand Cofolding with Boltz-2GitHubBoltz-1, Boltz-2, Cofolding, AllostericUpdates a previous analysis with the Boltz-2 model, concluding that allosteric pose prediction remains a significant challenge despite model improvements.
2025-07-21Three Papers Demonstrating That Cofolding Still Has a Ways to GoGitHubCofolding, Protein-LigandSummarizes three studies highlighting the limitations of current co-folding models, including their reliance on training set similarity and failure to identify allosteric sites.
2025-05-18GNN’s can extrapolate for some properties, but there’s a trickGitHubGNN, Machine Learning, ExtrapolationExplains how Graph Neural Networks (GNNs) can extrapolate properties like molecular weight using “norm aggregation,” correcting a previous finding.
2025-05-12Useful RDKit Utils - A Mötley Collection of Helpful RoutinesGitHubRDKit, Python, UtilitiesProvides a collection of helpful Python routines for common cheminformatics tasks using the RDKit library.
2025-05-06The Trouble With TautomersGitHubTautomers, RDKit, CheminformaticsDiscusses the challenges of handling tautomers in cheminformatics and demonstrates how to use RDKit for tautomer enumeration and standardization.
2025-04-26Why Don’t Machine Learning Models Extrapolate?GitHubMachine Learning, ExtrapolationExplores the fundamental reasons why traditional machine learning models often fail to generalize beyond the chemical space of their training data.
2025-04-26We’ve MovedBloggerGeneralAnnounces the blog’s move from Blogger to a new Markdown-based site on GitHub Pages.
2025-03-07Even More Thoughts on ML Method ComparisonsBloggerMachine Learning, BenchmarkingEmphasizes the need for rigorous statistical testing and better visualization when comparing machine learning models in drug discovery.
2024-11-16Some Thoughts on Splitting Chemical DatasetsBloggerData Splitting, Machine Learning, ValidationExamines how different data-splitting strategies (Random, Scaffold, Butina, UMAP) affect the evaluation and generalization of machine learning models.
2024-10-08Silly Things Large Language Models Do With MoleculesBloggerLLM, SMILES, CheminformaticsCritiques the use of general-purpose LLMs for molecule generation, showing they often rely on simple string manipulation rather than chemical understanding.
2024-09-30Digging Deeper into Thompson Sampling - A Guest Blog Post by Patrick RileyBloggerThompson Sampling, Active Learning, OptimizationAnalyzes why certain search tasks, like docking, are significantly more difficult to optimize using Thompson Sampling than others.
2024-05-22Generative Molecular Design Isn’t As Easy As People Make It LookBloggerGenerative Models, Molecular DesignDiscusses the extensive “child-proofing” and post-processing required to make generative AI models produce physically valid and chemically stable molecules.
2024-01-16AI in Drug Discovery - A Highly Opinionated Literature Review (Part III) BloggerAI, Drug Discovery, Literature ReviewProvides a curated list of review articles from 2023 covering property prediction, docking, and multi-objective optimization.
2024-01-08AI in Drug Discovery - A Highly Opinionated Literature Review (Part II) BloggerAI, Drug Discovery, Literature ReviewReviews 2023 developments in LLMs, Active Learning, and Explainable AI within the context of drug discovery.
2024-01-02 AI in Drug Discovery 2023 - A Highly Opinionated Literature Review (Part I)BloggerAI, Drug Discovery, Literature ReviewFocuses on 2023 advancements and critiques in docking, AlphaFold2, and benchmarking methodologies.
2023-12-07Some Thoughts on Biotech vs Pharma for Computational ChemistsBloggerCareer, Biotech, PharmaCompares career paths in biotech and pharma, discussing mentorship, software access, and the challenges of managing AI hype.
2023-11-27Comparing Classification Models - You’re Probably Doing It WrongBloggerMachine Learning, Classification, MetricsCritiques common mistakes in comparing classification models and suggests more robust metrics and validation strategies.
2023-08-03We Need Better Benchmarks for Machine Learning in Drug DiscoveryBloggerBenchmarking, Machine Learning, Drug DiscoveryAdvocates for the development of more realistic and challenging benchmarks that reflect the complexities of actual drug discovery projects.
2023-07-05A Simple Tool for Exploring Structural AlertsBloggerStructural Alerts, Toxicity, ScreeningIntroduces a tool for identifying, visualizing, and understanding structural alerts to aid in toxicity and screening assessments.
2023-06-12Getting Real with Molecular Property PredictionBloggerProperty Prediction, Machine LearningDiscusses the practical challenges of predicting molecular properties and the importance of data quality and model interpretability.
2023-05-02Using Counterfactuals to Understand Machine Learning ModelsBloggerXAI, Counterfactuals, Machine LearningExplores the use of counterfactual explanations to gain insights into how machine learning models make decisions about molecules.
2023-04-10Build a QSAR Model in 8 Lines of PythonBloggerQSAR, Python, Machine LearningDemonstrates the simplicity and power of modern Python tools for building predictive QSAR models with minimal code.
2023-04-03Getting Inside the Mind of the Medicinal Chemist with Machine LearningBloggerMedicinal Chemistry, Machine Learning, SARInvestigates how machine learning can be used to capture and augment the intuition of experienced medicinal chemists.
2023-03-12 Working With Drug Data from the ChEMBL DatabaseBloggerChEMBL, SQL, Data MiningProvides a practical guide to querying and processing drug-related data from the ChEMBL database using SQL and Python.
2023-02-01 Generative Molecular Design - We Need to Raise the BarBloggerGenerative Models, Molecular DesignCalls for more rigorous evaluation and higher standards for generative molecular design methods to ensure they provide real value.
2023-01-03AI in Drug Discovery 2022 - A Highly Opinionated Literature ReviewBloggerAI, Drug Discovery, Literature ReviewA comprehensive review of the most significant papers and trends in AI for drug discovery published throughout 2022.
2022-12-04Mining Ring Systems in Molecules for Fun and ProfitBloggerRing Systems, Data Mining, RDKitExplores different methods for identifying and analyzing ring systems in large chemical datasets using RDKit.
2022-03-28Clustering Fragment Screening Hits With a Self-Organizing MapBloggerClustering, SOM, Fragment-based DesignDemonstrates the use of Self-Organizing Maps (SOMs) to organize and analyze results from fragment-based screening campaigns.
2022-01-17The Solubility Forecast IndexBloggerSolubility, Physicochemical PropertiesExplains the development and utility of the Solubility Forecast Index (SFI) in early-stage drug discovery.
2022-01-03Useful RDKit UtilitiesBloggerRDKit, Python, UtilitiesA collection of practical RDKit-based utilities for common tasks like molecule cleaning, standardization, and property calculation.
2021-11-30Picking the Highest Scoring Molecule(s) From Each ClusterBloggerClustering, Selection, Virtual ScreeningProvides a Python-based workflow for selecting diverse, high-scoring molecules from virtual screening results using clustering.
2021-10-24Exploratory Data Analysis With mols2grid and Bemis-Murcko FrameworksBloggerEDA, mols2grid, Murcko FrameworksShows how to perform interactive EDA using the mols2grid library and scaffold analysis with Bemis-Murcko frameworks.
2021-09-12Similarity Search and Some Cool Pandas Tricks BloggerSimilarity Search, Pandas, PythonCombines chemical similarity searching with advanced Pandas techniques to efficiently process and analyze molecule data.
2021-08-31Building a multiclass classification modelBloggerMachine Learning, Classification, MulticlassWalks through the process of building and evaluating a multiclass classification model for chemical properties or activities.
2021-08-21Practical Cheminformatics - The DirectoryBloggerDirectory, IndexAn annotated guide and directory to the various resources and posts available on the Practical Cheminformatics blog.
2021-07-27Viewing Clustered Chemical Structures in a Jupyter NotebookBloggerVisualization, Clustering, JupyterDemonstrates techniques for visualizing and exploring the results of chemical clustering directly within Jupyter notebooks.
2021-07-07Automatic Analog Generation With Common R-group ReplacementsBloggerAnalog Generation, R-groups, SARPresents a tool for automatically generating chemical analogs by performing common R-group substitutions on a lead scaffold.
2021-06-03Assessing Interpretable ModelsBloggerXAI, Interpretable MLDiscusses methods for assessing the interpretability of machine learning models and why it matters in drug discovery.
2021-03-30Fast Parallel Cheminformatics Workflows With DaskBloggerDask, Parallel Processing, PythonExplains how to use the Dask library to parallelize and accelerate large-scale cheminformatics workflows in Python.
2021-01-18AI in Drug Discovery 2020 - A Highly Opinionated Literature ReviewBloggerAI, Drug Discovery, Literature ReviewA critical review of the key developments and publications in AI for drug discovery during the year 2020.
2020-11-17A Highly Opinionated List of Open Source Cheminformatics ResourcesBloggerResources, Open SourceA curated and annotated list of essential open-source software, libraries, and datasets for cheminformatics.
2020-10-31What Do Molecules That Look LIke This Tend To Do? BloggerSAR, Similarity, Data MiningUses similarity search and data mining to identify the typical biological activities associated with specific chemical motifs.
2020-10-12A Collection of Things I Frequently Forget How To Do With Seaborn ScatterplotsBloggerVisualization, Seaborn, PythonA handy cheat sheet for advanced formatting and customization of scatterplots using the Seaborn library.
2020-08-16Examining the Data From the ChEMBL SARS-CoV-2 Drug Repurposing ScreensBloggerChEMBL, SARS-CoV-2, Drug RepurposingAnalyzes early drug repurposing screening data for SARS-CoV-2 released by the ChEMBL database.
2020-06-22Wicked Fast Cheminformatics with NVIDIA RAPIDSBloggerRAPIDS, GPU, Parallel ProcessingDemonstrates how to use NVIDIA RAPIDS to significantly speed up cheminformatics tasks using GPU acceleration.
2020-05-24Using the Structure-Activity Landscape Index (SALI) to Analyze Data From the SARS-CoV-2 MPro ScreenBloggerSALI, SAR, SARS-CoV-2Applies the SALI index to identify activity cliffs and key SAR trends in SARS-CoV-2 MPro screening data.
2020-05-13Some Thoughts on Comparing Classification ModelsBloggerMachine Learning, Classification, MetricsDiscusses the nuances of evaluating and comparing different classification algorithms for chemical datasets.
2020-05-04Exploring the SARS-CoV-2 Main Protease (MPro) StructuresBloggerSARS-CoV-2, MPro, Protein StructureProvides an analysis of the first released crystal structures of the SARS-CoV-2 main protease.
2020-04-27Positional Analogue ScanningBloggerSAR, Analogue Scanning, RDKitIntroduces a systematic approach for exploring the impact of substituent position on biological activity using RDKit.
2020-04-11Adding Chemical Structures to a Recent COVID-19 Drug Repurposing DatasetBloggerCOVID-19, Data Cleaning, SMILESDocuments the process of cleaning and augmenting a public COVID-19 dataset with proper chemical structures.
2020-03-30Building on the Fragments From the Diamond/XChem SARS-CoV-2 Main Protease (MPro) Fragment Screen (Part II) Structure-Base Evaluation of Expanded FragmentsBloggerFragment-based Design, MPro, Structure-based DesignEvaluates fragment expansions for the SARS-CoV-2 main protease using structure-based modeling and docking.
2020-03-25Building on the Fragments From the Diamond/XChem SARS-CoV-2 Main Protease (MPro) Fragment Screen (Part I)BloggerFragment-based Design, MPro, SARS-CoV-2Discusses initial strategies for growing and merging fragment hits identified from a large-scale screen against the SARS-CoV-2 MPro.
2020-03-21Benchmarking “One Molecular Fingerprint to Rule Them All”BloggerFingerprints, BenchmarkingBenchmarks a variety of molecular fingerprints to see which perform best across a range of common cheminformatics tasks.
2020-02-09How (Not) to Get a Job in Science - Part 2 - The InterviewBloggerCareer, InterviewingOffers practical advice and common pitfalls to avoid during the scientific job interview process.
2020-01-21How to (Not) Get a Job in ScienceBloggerCareer, Job SearchShares insights and tips on navigating the scientific job market, from applications to networking.
2020-01-07Visualizing Decision TreesBloggerVisualization, Decision Trees, Machine LearningExplains how to create clear and informative visualizations of decision tree models used for chemical classification.
2019-11-13Interactive Plots with Chemical StructuresBloggerVisualization, Interactive Plots, BokehShows how to build interactive data visualizations that display chemical structures on hover using Bokeh.
2019-11-01Visualizing Chemical SpaceBloggerChemical Space, Visualization, Dimensionality ReductionExplores different dimensionality reduction techniques for visualizing the chemical space of large molecular libraries.
2019-09-19Dissecting the Hype With CheminformaticsBloggerHype, AI, CheminformaticsUses cheminformatics analysis to critically examine the hype surrounding AI in drug discovery.
2019-07-28How Good Could (Should) My Models Be? BloggerMachine Learning, Performance, Error BarDiscusses the theoretical and practical limits of model performance based on experimental data uncertainty.
2019-06-02Using Reaction Transforms to Understand SARBloggerSAR, Reaction Transforms, RDKitDemonstrates the use of RDKit reaction transforms to explore and understand structure-activity relationships.
2019-05-03Where’s the code? BloggerOpen Source, GitHubDiscusses the importance of code sharing in science and provides links to the blog’s open-source repositories.
2019-04-22Clustering 2.1 Million Compounds for $5 With a Little Help From Amazon & FacebookBloggerClustering, AWS, Large DatasetsShows how to perform large-scale chemical clustering affordably using cloud computing and efficient algorithms.
2019-03-31Multiple Comparisons, Non-Parametric Statistics, and Post-Hoc TestsBloggerStatistics, Multiple ComparisonsA practical guide to using robust statistical methods when comparing multiple experimental groups or models.
2019-03-03Plotting DistributionsBloggerVisualization, Statistics, SeabornExplains best practices for visualizing the distributions of chemical properties and experimental data.
2019-02-19Some Thoughts on Evaluating Predictive ModelsBloggerMachine Learning, Evaluation, MetricsDiscusses the selection of appropriate metrics for evaluating predictive models in different drug discovery contexts.
2019-01-17My Response to Peter Kenny’s Comments on “AI in Drug Discovery - A Practical View From the Trenches”BloggerAI, Drug Discovery, DebateA formal response to technical critiques regarding the author’s perspectives on AI in drug discovery.
2019-01-11K-means ClusteringBloggerClustering, Machine LearningProvides a basic introduction and implementation of K-means clustering for chemical data.
2018-11-16AI in Drug Discovery - A Practical View From the TrenchesBloggerAI, Drug Discovery, ImplementationShares practical insights and lessons learned from implementing AI solutions in real-world drug discovery projects.
2018-10-30Self-Organizing Maps - 90s Fad or Useful Tool? (Part 1)BloggerSOM, Clustering, Machine LearningRe-evaluates the utility of Self-Organizing Maps (SOMs) for modern cheminformatics and data visualization tasks.
2018-10-30Self-Organizing Maps - The Code (Part 2)BloggerSOM, Python, ImplementationProvides a step-by-step Python implementation of Self-Organizing Maps for chemical data analysis.
2018-10-06My Science/Programming JourneyBloggerCareer, PersonalA personal reflection on the author’s career path at the intersection of science and programming.
2018-09-29Assigning Bond Orders to PDB Ligands - The Easy WayBloggerPDB, Bond Orders, RDKitDemonstrates an efficient method for correctly assigning bond orders to small molecules extracted from the Protein Data Bank (PDB).
2018-09-24Some Notes From the 2018 RDKit UGM BloggerRDKit, ConferenceHighlights and key takeaways from the 2018 RDKit User Group Meeting.
2018-09-17A Few Updates to Free-WilsonBloggerFree-Wilson, SARDiscusses modern updates and computational improvements to the classic Free-Wilson SAR analysis method.
2018-09-05Predicting Aqueous Solubility - It’s Harder Than It LooksBloggerSolubility, Property PredictionExplores the complexities and challenges of accurately predicting aqueous solubility for drug-like molecules.
2018-08-20Scaffold Hopping? It’s ComplicatedBloggerScaffold Hopping, SARDiscusses the theory and practical difficulties of successful scaffold hopping in medicinal chemistry.
2018-08-08Filtering Chemical LibrariesBloggerLibrary Filtering, Drug-likenessReviews common rules and techniques for filtering large chemical libraries to identify drug-like leads.
2018-06-08Cheating at Word Cookies with PythonBloggerPython, FunA fun diversion demonstrating how to use Python to solve word puzzles.
2018-05-30Free Wilson AnalysisBloggerFree-Wilson, SARExplains the principles and application of Free-Wilson analysis for exploring structure-activity relationships.