Artificial Intelligence Methods for Enzyme Engineering

Author: Yang, Jason

Year: 2026

Degree: Dissertation (Ph.D.)

Advisors: Arnold, Frances Hamilton; Yue, Yisong

Committee Members: Wang, Zhen-Gang; Arnold, Frances Hamilton; Yue, Yisong; Gradinaru, Viviana

Option: Chemical Engineering

DOI: 10.7907/wqhh-dm06

Abstract

Proteins, such as specialized catalysts called enzymes, offer transformative potential for sustainable chemical synthesis, environmental remediation, and advanced therapeutics. However, engineering proteins for specific industrial or clinical functions remains a formidable challenge due to the expansive and high-dimensional sequence design space. While directed evolution has facilitated significant breakthroughs, the methodology is often constrained by slow iteration cycles and the requirement for a functional starting point. This thesis introduces novel machine learning frameworks, integrated with experimental workflows, to transcend traditional enzyme engineering approaches. We first present Active Learning-Assisted Directed Evolution (ALDE), which employs Bayesian optimization to enable more efficient optimization of protein properties. Afterward, Contrastive Reaction-Enzyme Pretraining (CREEP) is introduced for the annotation and discovery of enzymes with desired "new-to-nature" functionalities. Finally, a new paradigm for Steering Generation for Protein Optimization (SGPO) is demonstrated, unifying these two perspectives into a holistic generative framework for efficient protein engineering. Collectively, these innovations advance the transition toward automated, artificial intelligence-driven biomolecular design–unlocking sustainable synthesis, novel therapeutics, and programmable biology at the molecular level.