We Still Haven’t Found What We’re Looking For - The Continuing Evolution of Protein-Ligand Co-Folding Methods
Published:
 
 More Every Day
Last week’s NVIDIA GPU Technology Conference (GTC) featured two announcements that highlighted both the potential and ongoing challenges of protein-ligand co-folding. The recently renamed Genesis Molecular AI announced PEARL, a proprietary co-folding method. Additionally, the OpenFold consortium released the code and model weights for a preview of OpenFold3 (OF3p). Both groups also provided technical reports with initial benchmarks. Besides co-folding, the OpenFold team also shared structure prediction results for protein monomers and complexes, as well as antibody-antigen complexes and RNA monomers.
While OF3p aims to accurately reproduce AlphaFold3, PEARL seeks to push the current state of the art by incorporating “physics-based synthetic data” into its training set. Unfortunately, the source and features of this synthetic data have not yet been disclosed. Hopefully, Genesis will reveal details in future publications. Another major advancement with PEARL is its ability to condition on project-specific data. The technical report showed significant performance improvements when this approach was applied using publicly available PDB structures to condition.
Welcome to the Jungle
I was pleased to see that both groups provided benchmark data using the Runs N’ Poses dataset developed by Torsten Schwede’s team at the Swiss Institute for Bioinformatics. As I mentioned in a previous post, the protein-ligand complexes that make up the Runs N’ Poses benchmark are classified based on their similarity to the structures used to train the co-folding model. This similarity is assessed through protein sequence, binding pocket similarity, and ligand similarity. As the Schwede group demonstrated in their paper, the performance of co-folding methods declines significantly as the similarity to the training set decreases.
While PEARL and OF3p demonstrated impressive results on several benchmarks, the outcomes on Runs N’ Poses highlight ongoing challenges that co-folding approaches often encounter when adapting to systems beyond the training data. The plots below, from the PEARL and OF3p technical reports, show that all methods still struggle to accurately predict ligand poses for protein-ligand complexes that differ from those in the training set. In these plots, the x-axis groups test set compounds by their similarity to the training set, while the y-axis displays the fraction where the RMSD of the ligand structure is within 2 Å of the known experimental structure.
The figure below, from the PEARL technical report, shows that although the method enhances the current state of the art, performance still drops significantly for pockets and ligands that differ from the training set. Note that the results displayed below are the best RMSD from the top five solutions generated by each co-folding method.
The results below, from the OF3p technical report, show a similar trend. In the bar plots below, “Oracle” represents the best of the top five solutions (as shown above), while “Ranked” indicates the RMSD of the best-scoring solution. Once again, we see a significant decline in performance as the similarity to the training set decreases.
A Missed Opportunity
One aspect I wish had been included in both technical reports was the performance on allosteric binding sites. As reported in a recent paper by Eva Nittinger and coworkers at AstraZeneca (AZ), Boltz-1, RoseTTA-Fold, and NeuralPlexer struggled to accurately reproduce the poses of allosteric ligands. I repeated the AZ team’s work with Boltz-2 and obtained similar results. Most allosteric ligands were placed in the orthosteric binding site, which typically lies 10-20 Å away from the experimentally observed binding pose. While I don’t believe anyone has a clear explanation for why allosteric inhibitors are poorly predicted, many think it is due to the limited amount of evidence. The Protein Data Bank (PDB) contains hundreds of structures of ligands bound to orthosteric sites and only a few structures of ligands bound to allosteric sites. Perhaps these algorithms get overwhelmed by the comparatively large number of orthosteric structures. Either way, this is an important issue for the field and should be part of standard benchmarks. I don’t have access to PEARL, so I can’t evaluate Nittinger’s allosteric benchmark. Hopefully, the team at Genesis will do this and publish the results. However, now that the code and weights for OF3p are available, I can run that benchmark myself. Please stay tuned for the results.
What Next?
While the field still faces challenges, it’s exciting to see the rapid pace of recent advancements. I’m especially encouraged by efforts like OpenFold3 and Boltz-2, which are releasing top-tier models as open-source software. It will be interesting to see where the field goes next. As I see it, two clear trends are emerging. The first involves using synthetic data generated through physics-based methods to supplement existing data from the Protein Data Bank (PDB). Although the Genesis team didn’t specify how their synthetic data was created, this will likely motivate others to try similar approaches. One thing has become clear in today’s AI age: even if you don’t share exactly how you did something, simply doing it will inspire others to figure it out.
Another way to enhance co-folding performance is by increasing access to experimental data. Although pharmaceutical companies possess thousands of crystal structures of protein-ligand complexes, concerns over intellectual property prevent these structures from being publicly shared. The AI Structural Biology (AISB) Network, led by Apheris, addresses this by creating a federated learning infrastructure that allows co-folding methods to improve through secure integration of proprietary data into models. I see this as going deep rather than broad. The PDB is broad; it contains structures of thousands of different proteins. While there are some exceptions, usually only a few structures of ligands bound to each unique protein exist. In contrast, structural data from pharmaceutical companies is deep. They typically study a few specific proteins each year, but drug discovery projects for these targets can generate hundreds of structures. Additionally, these structures often come from only a few related series. It will be interesting to see how large collections of highly related structures can enhance co-folding methods.
Another initiative with the potential to greatly increase the number of publicly accessible structures is the OpenBind project led by Oxford University and the Diamond Light Source. This project, aiming to produce 500,000 x-ray structures in the next five years, could supply crucial data to help develop the next generation of co-folding algorithms.
Although we’re still a long way from predicting the structure of any ligand bound to any protein, the field continues to advance rapidly. I’m eager to see where we’ll be in a few months.
