Advances in Computational Protein Design: Development of More Efficient Search Algorithms and their Application to the Full-Sequence Design of Larger Proteins

Author: Hom, Geoffrey Kai Tong

Year: 2005

Degree: Dissertation (Ph.D.)

Advisor: Mayo, Stephen L.

Committee Members: Deshaies, Raymond Joseph; Rees, Douglas C.; Pierce, Niles A.; Mayo, Stephen L.

Option: Biochemistry and Molecular Biophysics

DOI: 10.7907/M4R9-YM51

Abstract

Protein design is the art of choosing an amino acid sequence that will fold into a desired structure. Computational protein design aims to quantify and automate this process. In computational protein design, various metrics may be used to calculate an energy score for a sequence with respect to a desired protein structure. An ongoing challenge is to find the lowest-energy sequences from amongst the vast multitude of sequence possibilities. A variety of exact and approximate algorithms may be used in this search.

The work in this thesis focuses on the development and testing of four search algorithms. The first algorithm, HERO, is an exact algorithm, meaning that it will always find the lowest-energy sequence if the algorithm converges. We show that HERO is faster than other exact algorithms and converges on some previously intractable designs. The second algorithm, Vegas, is an approximate algorithm, meaning that it may not find the lowest-energy sequence. We show that, under certain conditions, Vegas finds the lowest-energy sequence in less time than HERO. The third algorithm, Monte Carlo, is an approximate algorithm that had been developed previously. We tested whether Monte Carlo was thorough enough to do a challenging computational design: the full-sequence design of a protein. Monte Carlo didn’t find the lowest-energy sequence, although a similar sequence from Vegas folded into the desired structure. Several biophysical methods suggested that the Monte Carlo sequence should also fold into the desired structure. Nevertheless, the Monte Carlo structure as determined by X-ray crystallography was markedly different from the predicted structure. We attribute this discrepancy to the presence of a high concentration of dioxane in the crystallization conditions. The fourth algorithm, FC_FASTER, is an approximate algorithm for designs of fixed amino acid composition. Such designs may accelerate improvements to the physical model. We show that FC_FASTER finds lower-energy sequences and is faster than our current fixed-composition algorithm.

Files