Lukas Heuberger, Daniel Messmer, et al.
Advanced Science
Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build trustworthy models. We considered the effect of different design choices in the development of peptide bioactivity binary predictors and found that the choice of negative peptides and the use of homology-based partitioning strategies when constructing the evaluation set have a significant impact on perceived model performance providing more realistic estimation of the performance of the model when exposed to new data. We also show that the use of protein language models to generate peptide representations can both simplify the computational pipelines and improve model performance, and that state-of-the-art protein language models perform similarly regardless of size or architecture. Finally, we integrate these results into an easy-to-use AutoML tool to support the development of new robust predictive models for peptide bioactivity by biologist without a strong machine learning expertise. Source code, documentation, and data are available at \url{https://github.com/IBM/AutoPeptideML} and a dedicated web-server at \url{http://peptide.ucd.ie/AutoPeptideML}.
Lukas Heuberger, Daniel Messmer, et al.
Advanced Science
Nathaniel Park, Tiffany Callahan, et al.
arXiv
Laura Gardiner, Ritesh Krishna
Nat. Food.
Tiffany Callahan, Kevin Cheng, et al.
ACS Spring 2025