Artificial Intelligence Methods for Enzyme Engineering
Author: Yang, Jason
Year: 2026
Degree: Dissertation (Ph.D.)
Advisors: Arnold, Frances Hamilton; Yue, Yisong
Committee Members: Wang, Zhen-Gang; Arnold, Frances Hamilton; Yue, Yisong; Gradinaru, Viviana
Option: Chemical Engineering
DOI: 10.7907/wqhh-dm06
Abstract
Proteins, such as specialized catalysts called enzymes, offer transformative potential for sustainable chemical synthesis, environmental remediation, and advanced therapeutics. However, engineering proteins for specific industrial or clinical functions remains a formidable challenge due to the expansive and high-dimensional sequence design space. While directed evolution has facilitated significant breakthroughs, the methodology is often constrained by slow iteration cycles and the requirement for a functional starting point. This thesis introduces novel machine learning frameworks, integrated with experimental workflows, to transcend traditional enzyme engineering approaches. We first present Active Learning-Assisted Directed Evolution (ALDE), which employs Bayesian optimization to enable more efficient optimization of protein properties. Afterward, Contrastive Reaction-Enzyme Pretraining (CREEP) is introduced for the annotation and discovery of enzymes with desired "new-to-nature" functionalities. Finally, a new paradigm for Steering Generation for Protein Optimization (SGPO) is demonstrated, unifying these two perspectives into a holistic generative framework for efficient protein engineering. Collectively, these innovations advance the transition toward automated, artificial intelligence-driven biomolecular design–unlocking sustainable synthesis, novel therapeutics, and programmable biology at the molecular level.